logistic regression model

69
A PROJECT ON APPLICATION OF LOGISTIC REGRESSION MODEL IN PREDICTING BANK CREDIT RISK (A Case Study of Fidelity Bank PLC) BY DAVID PETER ILESANMI (129075018) AN ASSIGNMENT SUBMITTED SUBMITTED TO DR M. ADAMU-IRIA STATISTICAL PACKAGES MAT 829 MAY 2013

Upload: abdulfatai-shakirudeen

Post on 01-Dec-2015

85 views

Category:

Documents


0 download

DESCRIPTION

Analysis of Logistics Regression model:case study of Nigeria Banks.

TRANSCRIPT

Page 1: Logistic Regression Model

A PROJECT ON

APPLICATION OF LOGISTIC REGRESSION MODEL IN PREDICTING

BANK CREDIT RISK

(A Case Study of Fidelity Bank PLC)

BY

DAVID PETER ILESANMI

(129075018)

AN ASSIGNMENT SUBMITTED

SUBMITTED TO

DR M. ADAMU-IRIA

STATISTICAL PACKAGES

MAT 829

MAY 2013

Page 2: Logistic Regression Model

TABLE OF CONTENTS

Abstract......................................................................................................................................1

TABLE OF CONTENT

1.0 INTRODUCTION ........................................................................................ 1

1.1 Overview and History of the Bank ............................................................... 1

1.2 Scope of the Study ......................................................................................... 2

1.3 Limitation of the Study .................................................................................. 2

1.4 Objectives ...................................................................................................... 2

1.5 Significance of the Study .............................................................................. 3

1.6 Definition of Terms ....................................................................................... 3

CHAPTER TWO

LITERATURE REVIEW ........................................................................................... 5

2.0 Banks Failure ................................................................................................. 5

2.1 Predicting Default of Enterprises................................................................... 5

2.2 Credit Risk ..................................................................................................... 6

2.3 The Origins of Logistic Regression .................................................... .........11

2.4 The Use of Logistic Regression Over Linear Least Square……….…….....15

2.5 The Minimum Variance Method……………………………………………15

2.6 Binary and Multinomial Logistic Regression……..…………………….…15

CHAPTER THREE ................................................................................................... ..17

3.0 Research Methodology ............................................................................ ..17

Page 3: Logistic Regression Model

3.1 Method of Data Collection ......................................................................... .17

3.2 Method of Data Presentation ....................................................................... 17

3.3 Data Analysis Method ............................................................................... ..17

3.4 Model Estimation .................................................................................... .17

3.4.1 Coding and Interpretation of the Coefficients of the Model...............19

3.5 Choosing a Significant Model.............................................................19

3.6 Loss Function.....................................................................................22

3.7 ROC Curve.........................................................................................23

CHAPTER FOUR .................................................................................................. ….24

DATA PRESENTATION AND ANALYSIS ....................................................... ..24

4.1 Data Presentation ...................................................................................... ..24

4.2 Data Analysis........................................………...........................................24

4.3 Descriptive Analysis ............................................................................... ….24

4.4 Logistic Regression Analysis ...................................................................... 26

4.5 The Odds Ratio.…………………………………………………………...31

4.6 Model Assessment……………………………………………………...32

4.7 Classification and Validation…………………………………………..34

4.8 ROC Curve………………………………………………………….....36

CHAPTER FIVE .................................................................................................... ….38

5.0 SUMMARY, CONCLUSIONS AND RECOMMENDATION ................. 38

5.1 Summary of Findings and Conclusion ........................................................ 38

5.2 Recommendation ......................................................................................... 39

REFERENCES………………………………………………………………………..40

APPENDIX 1

Page 4: Logistic Regression Model

ABSTRACT

It is a common practice for people to live on loans in other to survive. Some will pay back at

the right time while some find it difficult to pay back their loans therefore making the lender

run after them. There is a need therefore to develop a model that will help determine potential

customers that are likely to default on the loans.

This research built a Binary Logistic Regression Model with the aid of Statistical Package for

Social Sciences to help lenders predict potential customers who are likely to default. It

examines the dependent variable: dichotomous outcome (default) by using the independent

variables (Loans, Balance, Collateral, Interest, Number of Days, gender and Education level)

which are either continuous or categorical variables.

Page 5: Logistic Regression Model

CHAPTER ONE

1.0 INTRODUCTION

It is a common practice for people to live on loans in other to survive. And in

order to do this they have to obtain loans either from relatives, banks or thrifts

and credit co-operatives. Business men and woman obtains loans to finance or

start their business why while individuals obtain loans to start business, meet

their daily needs, build houses, pay children school fees, etc…it is not surprising

that some of them find it difficult to pay back their loans therefore making

lenders run after them. So every loan obtains from loan officer either from banks

or thrifts and credit co-operatives, it is very important for the loan officer to

examine these customers to know whether they will pay back or not. There is a

need therefore to develop a model that will help determine potential customers

that are probably default on the loans-: customers that will not pay back

(customers that will go bad) or customers that will pay back (customers that will

not go back).

Loan defaulters have successfully broken down some banks or thrifts and credit

co-operatives. So in other for these organizations not to go bankrupted, they

always request for collateral from customers so as to hold the customer on their

toes to pay back the money. This collateral is even part of the factors to put in

consideration in case there is default. Other are-: the authorized limit (amount

given), interest, etc. In order to know which factors will influence this, we make

use of logistic regression model to predict the good or bad credit risk type. Since

some of the dependent variables are binary in nature: categorical in nature. There

is need therefore to use Binary Logistic model which is a general approach to the

analysis of model with binary data that is comparable with the use of linear

models with normally distributed responses. Binary logistic model is use for this

type of responses or outcomes that are dichotomous, or binary, or categorical in

nature.

1.1 Overview and History of the Bank

Page 6: Logistic Regression Model

Fidelity Bank Plc began operations in 1988 as Fidelity Union Merchant Bank

Limited. By 1990, it had distinguished itself as the fastest growing merchant bank

in the country. However, to leverage the emerging opportunities in the

commercial and consumer end of financial services in Nigeria, in 1999, it

converted to commercial banking and changed its name to Fidelity Bank Plc. It

became a universal bank in February 2001, with a license to offer the entire

spectrum of commercial, consumer, corporate and investment banking services.

In 2011, the bank was ranked the 7th most capitalized bank in [Nigeria], the 25th

most capitalized bank on the African continent and the 567th most capitalized

bank in the world. The bank was established as a merchant bank in 1988. It

converted to a commercial bank in 1999, following the issuance of a commercial

banking license by the Central Bank of Nigeria, the national banking regulator.

The current enlarged Fidelity Bank is the result of the merger with the former

FSB International Bank Plc and Manny Bank Plc (under the Fidelity brand name)

in December 2005. Fidelity Bank is today ranked amongst the top 10 in the

Nigerian banking industry, with presence in all the 36 state as well as major cities

and commercial centres of Nigeria. Fidelity continues to rank among Nigeria’s

most capitalized banks, with tier-one capital of nearly USD 1 billion (one billion

US Dollars)

1.2 Scope of the Study

This research is based on the data collected from Fidelity Bank as a case study

used to predict those customers that are likely default on their loans or not.

1.3 Limitation of the Study

This research major challenge was my inability to get data from some banks.

They were afraid of giving out those details to me because they believed that they

are meant to be kept secret away from third party. Eventually I was able to get

these data from fidelity bank as a result of indefatigability efforts exercised by me

even though it was never easy.

1.4 Objectives:

Page 7: Logistic Regression Model

1. To develop the best model for predicting a dichotomous variable Y(0,1)

called a response(dependent)variable that will serve to explain whether a

customer is a bad or good credit risk type.

2. To develop a model that will help disallow loan defaulting.

1.5 Significance of the Study

To build a Binary Logistic Model that will help loan officers predict customers

that will likely default on the loans given to them or not; knowing whether a

customer is a good or bad credit risk type. Based on this, loan officers can now sit

down and conveniently predict such customer, and refuse to give out loans to

them therefore disallowing loans defaulting.

1.6 Definition of Terms

Authorized Limit (Loans): this is the amount granted to customers.

Binary Logistic Regression analysis examines the influence of various factors on

a dichotomous outcome by estimating the probability of the event’s occurrence.

Binary logistic regression is typically used when the dependent variable is

dichotomous and the independent variables are either continuous or categorical

variables.

Credit Risk: the risk of loss of principal or loss of a financial reward stemming

from a borrower’s failure to repay a loan or otherwise meet a contractual

obligation.

Default: The inability of a borrower to pay the interest or principal on a debt

when it is due.

Dependent Variable: this is the response that shows two possible outcomes

whether the risk being taking is good or bad. The dependent variable as the name

implies depends on the independent variables.

Page 8: Logistic Regression Model

Independent Variables: these are the various factors that influence the outcomes.

Odds: the number of defaulting cases divided by the number of not defaulting

cases.

Odds Ratio: An odds ratio is the ratio of the odds for two different groups

Outstanding Balance: the remaining amount owing by the customers.

Relative risk (RR): ratio between the risk of defaulting and the odd of not

defaulting.

Risk or probability: the number of cases in which there is default, divided by the

total number of cases or risk of occurrence of default.

Security Type: the type of security used to secure the loans.

Security Value: the value of the security type. It is known as collateral.

Page 9: Logistic Regression Model

CHAPTER TWO

LITERATURE REVIEW

2.0 Banks Failure

Failure rates are very often difficult to track properly. However, in the past few

years, considerable research (e.g., Everett and Watson, 1996; and Headd, 2003)

has been conducted to determine the rates and causation of such failures. Two of

the principle reasons businesses suffer unexpected closures are insufficient

capitalization and lack of planning. However, the previous research analyzed only

financial ratios in order to explain the default firms.

However, recent literature (Peel et al., 1986; Grunet et al., 2004; Peel and Peel,

1989; Hill and Wilson, 2007; and Altman et al., 2010) concludes that financial

variables are not sufficient to predict SME default and that including nonfinancial

variables improves the models’ prediction power. When analyzing business

failure, it is extremely important to distinguish between failure and closure.

Watson and Everett (1996) mention that closing firms could have been

financially successful but closed for other reasons: the sale of the firm or a

personal decision by the owner to accept employment with another firm, to retire,

or the like. To define failure they created five categories:ceasing to exist

(discontinuance for any reason); closing or a change in ownership; filing for

bankruptcy; closing to limit losses; and failing to reach financial goals. Brian

Headd (2003) finds that only one-third of new businesses (33%) closed under

circumstances that owners considered unsuccessful.

2.1 Predicting Default of Enterprises

According to the literature various methods have been used to predict the default

of enterprises. Beaver (1967) originally proposed the use of univariate analysis.

Altman (1968), Altman et al. (1977), and Pompe and Bilderbe (2005) used

Multiple Discriminant Analysis (MDA). For many years thereafter, MDA was the

prevalent statistical technique applied to the default prediction models. It was

Page 10: Logistic Regression Model

used by many authors (Taffler and Tisshaw; 1977; Altman et al., 1977; Micha,

1984). O hlson (1980), for the first time, applied the conditional logistic

regression (logit) to the default prediction’s study. The research examining

bankruptcy (Ohlson, 1980; Aziz et al., 1988) favors the logit over MDA for both

theoretical and empirical reasons. The logit model requires less restrictive

statistical a ssumptions and offers better empirical discrimination (Zavgren,

1983).

Moreover, the estimated coefficients can be interpreted separately as the

importance or significance of each of the independent variables in the explanation

of the estimated PD. Other researchers also used logit model in order to examine

the default firms (Keasey and Watson, 1987; Ooghe et al., 1995; and Becchetti

and Sierra, 2002).

2.2 Credit Risk

Credit risk is always a big issue in the investment world. Whether you are trying

to determine how much of a credit risk you are or how much of a risk someone or

something else is, you have to know what to look for. Credit is used to help

lenders determine whether someone is worth a risk when it comes to borrowing

money, and in the investing world it plays a big role in what you do or don’t have

access to. Trying to figure out whether something or someone has a reasonable

credit risk takes time and a lot of factors, but it can be done. Things that affect the

credit risk of any person or company include factors like:

Credit history: If the person or company in question has a stable credit history,

they are going to be a better credit risk. Those who have a poor history are less

likely to get approved for the funding that they need simply because they don’t

prove that they are worth it.

Credit rating: Your current credit rating directly impacts the level of risk that you

present. In the consumer world, a credit score of 720 or higher makes you an

excellent risk, while a score below 599 makes you a very bad credit risk.

Companies have a similar rating scale that helps determine how worthy of a risk

they will be.

Page 11: Logistic Regression Model

Debt to income ratio: If you have more going out than you have coming in, as a

consumer or a business, you have to make sure that you have a low debt to

income ratio to be a better credit risk. That means to keep your debts low as much

as you can, and pay things down whenever you get them up too high.

Security or collateral: In some cases, you may want or need to provide some type

of asset to prove that you are a valuable credit risk. That way, even if your

information doesn’t prove you to be a good candidate, the creditor or investor

will know that they aren’t losing anything by investing in you.

When you’re trying to determine credit risk, these are factors to consider. The

world of credit risk is similar among businesses and consumers alike, even

though the specific rules are different from one area to the next. Ultimately, as

long as you understand the basics, you’ll be able to easily determine whether you,

or anyone or anything else, are worth the investment in the end.

Types of Credit Risk

Credit risk can be classified in the following way:

Credit default risk - The risk of loss arising from a debtor being unlikely to pay

its loan obligations in full or the debtor is more than 90 days past due on any

material credit obligation; default risk may impact all credit-sensitive

transactions, including loans, securities and derivatives.

Concentration risk - The risk associated with any single exposure or group of

exposures with the potential to produce large enough losses to threaten bank's

core operations. It may arise in the form of single name concentration or industry

concentration.

Country risk - The risk of loss arising from a sovereign state freezing foreign

currency payment (transfer/conversion risk) or when it defaults on its obligations

(sovereign risk).

Page 12: Logistic Regression Model

Assessing credit risk

Significant resources and sophisticated programs are used to analyze and manage

risk. Some companies run a credit risk department whose job is to assess the

financial health of their customers, and extend credit (or not) accordingly. They

may use in house programs to advise on avoiding, reducing and transferring risk.

They also use third party provided intelligence.

Most lenders employ their own models (credit scorecards) to rank potential and

existing customers according to risk, and then apply appropriate strategies. With

products such as unsecured personal loans or mortgages, lenders charge a higher

price for higher risk customers and vice versa. With revolving products such as

credit cards and overdrafts, risk is controlled through the setting of credit limits.

Some products also require security, most commonly in the form of property.

Credit scoring models also form part of the framework used by banks or lending

institutions grant credit to clients. For corporate and commercial borrowers, these

models generally have qualitative and quantitative sections outlining various

aspects of the risk including, but not limited to, operating experience,

management expertise, asset quality, and leverage and liquidity ratios,

respectively. Once this information has been fully reviewed by credit officers and

credit committees, the lender provides the funds subject to the terms and

conditions presented within the contract (as outlined above).

Sovereign risk

Sovereign risk is the risk of a government becoming unwilling or unable to meet

its loan obligations, or reneging on loans it guarantees. Many countries have

faced sovereign risk in the late-2000s global recession. The existence of such risk

means that creditors should take a two-stage decision process when deciding to

lend to a firm based in a foreign country. Firstly one should consider the

sovereign risk quality of the country and then consider the firm's credit quality.

Five macroeconomic variables that affect the probability of sovereign debt

rescheduling are:

Page 13: Logistic Regression Model

Debt service ratio

Import ratio

Investment ratio

Variance of export revenue

Domestic money supply growth

Counterparty risk

A counterparty risk, also known as a default risk, is a risk that a counterparty will

not pay what it is obligated to do on a bond, credit derivative, trade credit

insurance or payment protection insurance contract, or other trade or transaction

when it is supposed to.[11] Financial institutions may hedge or take out credit

insurance of some sort with a counterparty, which may find themselves unable to

pay when required to do so, either due to temporary liquidity issues or longer

term systemic reasons. Large insurers are counterparties to many transactions,

and thus this is the kind of risk that prompts financial regulators to act, e.g., the

bailout of insurer AIG. On the methodological side, counterparty risk can be

affected by wrong way risk, namely the risk that different risk factors be

correlated in the most harmful direction. Including correlation between the

portfolio risk factors and the counterparty default into the methodology is not

trivial.

Mitigating credit risk

Lenders mitigate credit risk using several methods:

Risk-based pricing: Lenders generally charge a higher interest rate to borrowers

who are more likely to default, a practice called risk-based pricing. Lenders

consider factors relating to the loan such as loan purpose, credit rating, and loan-

to-value ratio and estimates the effect on yield (credit spread).

Page 14: Logistic Regression Model

Covenants: Lenders may write stipulations on the borrower, called covenants,

into loan agreements:

Periodically report its financial condition

Refrain from paying dividends, repurchasing shares, borrowing further, or

other specific, voluntary actions that negatively affect the company's

financial position

Repay the loan in full, at the lender's request, in certain events such as

changes in the borrower's debt-to-equity ratio or interest coverage ratio

Credit insurance and credit derivatives:

Lenders and bond holders may hedge their credit risk by purchasing credit

insurance or credit derivatives. These contracts transfer the risk from the lender to

the seller (insurer) in exchange for payment. The most common credit derivative

is the credit default swap.

Tightening: Lenders can reduce credit risk by reducing the amount of credit

extended, either in total or to certain borrowers. For example, a distributor selling

its products to a troubled retailer may attempt to lessen credit risk by reducing

payment terms from net 30 to net 15.

Diversification: Lenders to a small number of borrowers (or kinds of borrower)

face a high degree of unsystematic credit risk, called concentration risk. Lenders

reduce this risk by diversifying the borrower pool.

Deposit insurance: Many governments establish deposit insurance to guarantee

bank deposits of insolvent banks. Such protection discourages consumers from

withdrawing money when a bank is becoming insolvent, to avoid a bank run, and

encourages consumers to hold their savings in the banking system instead of in

cash.

Page 15: Logistic Regression Model

2.3 The Origins of the Logistic Regression

Logistic regression model is considered one of the most frequently used statistical

Models for several predictor variables that may be either numerical or categorical

(variables dichotomous in nature) which cannot be analyzed using regression.

There is no evidence that its utility will be declined in the near future given the

steady role regression analysis plays in research. In statistics, Logistic Regression

which is sometimes called Logistic Model or Logit Model is used for predicting

the probability of occurrence of an event by fitting data to a logistic curve.

The logistic function was invented in the 19th century for the description

of the growth of populations and the course of autocatalytic chemical

reactions, or chain reactions. In either case we consider the time path of

a quantity W(t) and its growth rate.

W(t) = dW(t)/dt (1)

The simplest assumption is that W(t) is proportional to W(t)

W(t) = β W(t), β = W(t)/ W(t), (2)

With β the constant rate of growth. This leads of course to exponential growth

W(t) =Aexpβt where A is sometimes replaced by the initial value W(0) with

W(t) the human population of a country, this is a model of unopposed growth; as

Malthus (1789) put it, ”a human population, left to itself will increase in

geometric progression”. It is a reasonable model for a young and empty country

like the US in its early years. Like many others, Alphonse Quetelet, the Belgian

astronomer turned statisticians, was well aware that the indiscriminate

extrapolation of exponential growth must lead to impossible values.

Like Quetelet, Verhulst approached the problem by adding an extra term to

equation (2) to represent the increasing resistance to further growth, as in

W(t)= β W(t)- W(t) (3)

And then experimenting with various forms of . The logistic appears when this

is a simple quadratic, for in that case we may rewrite equation (3) as

Page 16: Logistic Regression Model

W(t)=β W(t)(Ω- W(t)) (4)

Where Ω denotes the upper limit or saturation level of W,its asymptote as

t→∞. Growth is now proportional both to the population already attained

W(t) and to the remaining room for further expansion Ω- W(t). If we

express W(t) as a proportion

P(t)= W(t)/Ω this gives

P(t) = β P(t)(1-P(t)) (5)

And the solution of this differential equations is

P(t) =

(6)

which Verhulst named the logistic function. The population W(t) then follows

W(t) = Ω

(7)

Verhuslt published his suggestions between 1838 and 1847 and he explains that

he did his research a couple of years before, that he did not have the time for an

update and that he publishes this note only at the insistence of Quetelet and he

named it the logistic. Verhulst also determines the three parameters Ω,α,and β of

equation (7) by making the logistic curve pass through three observed points. His

discovery of the logistic curve was not taken up much enthusiasm by Quetelet; as

Vanpaeemel (1987) has shown, the two men did not see eye to eye on the

question of population growth. As a model of population growth the logistic

function was discovered anew in 1920 pearl and reed. They were unaware of

verhulst’s work (though not of the curves for autocatalytic reactions discussed

presently), and they arrived independently at the logistic curve of equation (7).

Later, Verhulst’s work was rediscovered soon after Pearl and Reed’s first paper

of 1920. Verhulst much more handsomely than pearl and reed did, devoting an

appendix to his work. Yule is also the first author to revive the name logistic,

which is not used by Liagre or Du Pasquier (a mathematician who later followed

courses in social sciences) nor by Pearl and Reed in their earlier references. By

1924, however, “logistic” is used as a commonplace term in the correspondence

Page 17: Logistic Regression Model

between pearl and Yule, who were lifelong friends. As we have already hinted

there is another early root of the logistic function in chemistry, where it was

employed (again with some variations) to describe the course of autocataytic or

chain reactions, where the product itself acts as a catalyst for the process while

the supply of raw materials is fixed. This leads naturally to a differential equation

like (5) and hence to the logistic function for the time path of the amount of the

reaction product. The review of the application of logistic curves to a number of

such process by Reed and Berkson (1929) quotes work of the German professor

of Chemistry Wilhelm Ostwald of 1883. Authors like Yule (1925) and Wilson

(1925) were well aware of thus strand of the literature. The basic idea of logistic

growth is simple and effective, and it is used to this day to model population

growth and market penetration of new products and technologies. The

introduction of mobile telephones is an autocatalytic process, and so is the spread

of many new products and techniques in industry.

The close resemblance of the logistic to the normal distribution function must

have been common knowledge among those who were familiar with the logistic;

it had been demonstrated by Wilson (1925) and written up by Winsor (1932)

(another collaborator of pearl). Wilson was probably the first to publish an

application of the logistic curve in bio-assay in Wilson and Worcester (1943), just

before Berkson (1944). But it was Berkson who persisted and fought a long and

spirited campaign which lasted for several decades.

An accurate history of the adoption and further development of the logit

would require an intimate knowledge of several quite distinct disciplines, for

many new generalizations were introduced independently and in almost complete

isolation in completely unrelated applied work.

In statistic, the analytical advantages of the logit transformation as a means of

dealing with discrete binary outcomes were soon recognized. Cox was among the

first to explore (and exploit) these possibilities; he wrote a series of papers

between around 1960, and followed these up with an influential textbook in 1969.

The logit model of bio-assay is easily generalized to logistic regression where

binary outcomes are related to a number of determinations, without a specific

Page 18: Logistic Regression Model

theoretical background, and this statistical model proved as fertile as linear

regression in an earlier era. Later, the link of the logistic model with discriminate

analysis was recognized, and its ready association with loglinear models in

general. On the specific issue of estimating logit and probit (probability unit)

analyses, maximum likelihood estimation became the norm when routines for this

method, applicable to individual data, were included in commercial statistical

program packages. This facility was probably first offered by BMDP (biomedical

data processing) program of 1977. By the time the first comprehensive textbook

with medical applications of Hosmer and Lemeshow (1989) was published the

use of such routines was taken for granted. Of the two causes Berkson advocated,

minimum chi-squared estimation was effectively overtaken by the computer

revolution, while the logit transformation of logit(p) =log p/1-p was triumphant.

The theoretical justification of bio-assay in terms of determinate stimulus and

random thresholds was first jettisoned in the change to logistic regression, and

then retrieved in the form of the latent regression equation model that is still dear

to the behavioural sciences.

An example of simultaneous independent discoveries is the generalized logistic

regression to the multinomial or polychotomous case. This was the first set out, at

soem length, by the biometric statistician mantel (1966). And some years later

again it was once more rediscovered independently by the econometrician Theil

(1969), who arrived at it from the general perspective of modelling shares.

For a long time, logistic regression, whether in the binary or the multi-nominal

context, was principally used as a technique, a simple tool without a specific

underlying process and therefore without a characteristic interpretation. But in

1973 McFadden, working as a consultant for a California public transportation

project, linked the multinomial logit to the theory of discrete choice from

mathematical psychology. This provided a theoretical foundation of the logit

model that is much more profound than any theory put forward for the use of the

probit in bio-assay.

Page 19: Logistic Regression Model

This same logistic regression we want to use in discovery of the customers that

will default on loans given to them by means of taking into cognizance some

factors that will influence the good or bad loans.

2.4 The use of Logistic Regression over Linear Least

Squares

Logistic regression is very different from linear least squares regression in the

sense that its equations are solved iteratively unlike least squares which can be

solved explicitly with formula. A trail equation is fitted and tweaked over and

over in order to improve the fit. Iterations stop when the improvement from one

step to next is suitably small. Though, from a practical point, both are almost

identical because they both predict but the response variable of logistic regression

is an indicator of some characteristics that is 0 and 1 variable. It is used to

determine whether

2.5 The Minimum Variance Method

In this method, getting the line of best fit is determined by the estimates Ω and β

respectively. This will have a minimum variance amongst all unbiased estimators

of α and β.

2.6 Binary and multinomial Logistics Regression

Binary is a type of logistic regression model which is use for categorical

dependent variable of two outcomes and multinomial is used as outcomes.

Choosing a Procedure for Binary Logistics Regression Models

Binary Logistics regression models can be fitted using either the Logistic

Regression procedure or the multinomial Logistic Regression procedure.

Each Procedure has options not available in the other. An important theoretical

distinction is that the logistic Regression procedure produces all predictions,

residuals, influence statistic, and goodness-of-fit test using data at the individual

Page 20: Logistic Regression Model

case level, regardless of how the data are entered and whether or not the number

of covariate patterns is smaller than the total number of cases, while the

multinomial logistic Regression procedure internally aggregate cases to form

subpopulations with Identical covariate patterns for the predictors, producing

predictions, residuals, and goodness-of-fit test based on these subpopulations. If

all predictors are categorical or any continuous predictors take only a limited

number of values-so that there are several cases at each distinct covariate

patterns-the subpopulation approach can produce valid goodness-of-fit tests and

for informative residuals, while the individuals case level are approach cannot.

Binary Logistic Regression provides the following unique features:

Hosmer-Lemeshow test of goodness of –fit for the model

Stepwise analyses.

Contrasts to define model parameterization

Alternative cut points for classification

Model fitted on one set of cases to a held-out set of cases

Saves predictions, residuals, and influence statistics

Multinomial Logistic Regression Provides The Following Unique Features:

Pearson and deviance chi-square tests for goodness of fit of the model

Specification of subpopulations for grouping of data for goodness-of-fit

tests

Listing of counts, predicted counts, and residuals by subpopulations

Correction of variance for over-dispersion

Covariance matrix of the parameter estimates

Tests of linear combinations of parameters

Explicit specification of nested models

Fit 1-1 matched conditional logistic regression models using differenced

variables.

Page 21: Logistic Regression Model

CHAPTER THREE

3.0 Research Methodology

Binary Logistic Regression model is the statistical tool used to carry out this

research in order to study the tendency of customers who will default on the loans

given to them. Records of 300 customers were used in the process.

3.1 Method of Data Collection

The data collected for this research are the data of customers who collected

Fidelity Bank Nigeria PLC. Their balances as at 5th

0f May, 2013, are given in

Appendix I. This is a secondary type of data because it will be almost impossible

to collect these data from individual directly.

3.2 Method of Data Presentation

The data collected will be shown in tabular form with each variable forming the

columns and each customer makes up the rows. The result after analysing the

data is shown through tables and plots in chapter four.

3.3 Data Analysis Method

Statistical package for social sciences (SPSS) is employed to analyse the data

using Binary Logistic Regression Model in order to estimate the parameters

involved due to two outcomes of the dependent variable.

3.4 Model Estimation

A Linear Logistic Model (LLM) assumes that for each possible set of values for

the independent (X) variables, there is a probability p that an event (success)

occurs. Then, the model is that Y is a linear combination of the values of the x

vector.

Y = β0 + βixi+e i=1,2,3,....n

Y= β0 +β1x1 +β2x2 + β3 x3 + β4 x4 +………..+ β n xn +e

Page 22: Logistic Regression Model

If x1, x2,……….,x1 are a collection of independent variables and Y is a vector

variable with the probability of success(p), then

E(Y/X) = β0 +β1 E(x1) +β2E( x2) + β3 E( x3)+………..+ β n E( xn)

If y = 1 with probability P and

Y = 0 with probability (1-p)

Odds = p/(1-P)

Logistic Regression model is given as Logit(y) = In(p/1-p) which starts by

considering the existence of an unobserved continuous variables, Y, which can be

thought of as the customer’s propensity to default on a loan, with larger values of

Y corresponding to greater probabilities of defaulting.

Y = β0 + βixi+e i=1,2,3,....n

Y= β0 +β1x1 +β2x2 + β3 x3 + β4 x4 +………..+ β n xn +e

Where β0=the constant of the equation and, βi = the coefficient of the predictor

variables.

And “e” are the residuals, i.e the variability not explained in the model.

The model assumes that Y is linearly related to the predictors.

In the logistic regression model, the relationship between Y and the probability

of the event of interest is described by this link function.

Y = In(p/1-p)

Where

P is probability that each case experiences the event of interest

Y is the value of the unobserved continuous variable for each case.

Page 23: Logistic Regression Model

Since Y is unobserved, we relate the predictors to the probability of interest by

substituting for

Y= β0 +β1x1 +β2x2 + β3 x3 + β4 x4 +………..+ β n xn +e

In the model and the regression coefficients are estimated through an iterative

maximum likelihood method.

3.4.1 Coding and Interpretation of the Coefficients of the Model

Dependent variable: We code as 0 the occurrence of default and 1 the absence of

default

Independent variable: these are of different types:

Numerical variable: in other to introduce the variable in the model, it must

satisfy the linearity hypothesis, i.e., for each unit increase in the numerical

variable, the OR (βi) increases by a constant multiplicative value

Dichotomic variable: Male coded as 1 and Female coded as 2.

Categorical variable: education level has the following codes

o Primary =1

o Secondary=2

o OND=3

o HND/BSc=4

o Other professional qualifications=5

When coefficient β of the variable is positive, we obtain OR >1, and it

therefore=re corresponds to a risk factor. If the value of β is negative,OR will be

< 1, and the variable therefore corresponds to a protective factor

3.5 Choosing a Significant Model

An estimating algorithm is used to find the coefficients; Β’s that best satisfy

the relationship expressed in the regression equation for the estimation data

sample. The technique used to find those coefficients for logistic regression, was

Page 24: Logistic Regression Model

using maximum likelihood estimation. Basically, method tries coefficients until it

finds the set that maximizes the value of a mathematical function that gives the

joint probability of observing the given data.

That function, L, the likelihood function, forms the basis of a statistical test

of how well the model fits the observed data.

L2

=2(logLT -logLB )

where LT is the likelihood function of the first model with smaller variables and

the LB is the likelihood function of a baseline model.

L2 is a statistic that will be compared with the standard table to determine

whether the tested model fits significantly better than a baseline model. This

procedure is completed to establish whether the added variables significantly

improve to the data, or whether conversely the smaller subset is equally

sufficient. We use this procedure to omit from the model variables that do not

significantly improve our ability to predict insolvency.

Classification plots are used to show graph of the data analysis. Hosmer-

Lemeshow goodness-of-fit statistic indicates a poor fit if the significance value is

less than 0.05.

Pseudo R-Squared Statistics are used since the r-squared statistic, which

measures the variability in the dependent variable explained by a linear regression

model, cannot be computed for logistic regression models. It is designed to have

similar properties to the true r-squared statistic.

Also, the square of the ratio of the coefficient to its standard error equals the

Wald statistic.

Wald Test

The Wald test is used to test the statistical significance of each coefficient (b) in

the model. A Wald test calculates a Z statistics which is:

(

)2

Page 25: Logistic Regression Model

This value is squared which yields a chi-square distribution and is used as the

Wald test statistics (Alternatively the value can be directly compared to a normal

distribution).

The goodness of fit of a statistic: the model describes how well it fits a set of

observations. Measures of goodness of fit typically summarize the discrepancy

between observed values and the values expected under the model in question.

Such measures can be used in statistical hypothesis testing, e.g. to test for

normality of residuals, to test whether two samples are drawn from identical

distributions or whether outcome frequencies follow a specified distribution.

One way in which a measure of goodness of fit statistic can be constructed, in

the case where the variance of the measurement error is known, is to construct a

weighted sum of squared error:

=∑

Where is the known variance of the observation. This definition is only useful

when one has estimates for the error on the measurements, but it leads to a

situation where a chi-square distribution can be used to test goodness of fit,

provided that the errors can be assumed to have a normal distribution.

The reduced chi-squared statistic is simply the chi-squared divided the number

of degrees of freedom:

=

Where V is the number of degrees of freedom, usually given by N-n-1, where N

is the number of observations, and n is the number of fitted parameters, assuming

that the mean value is an additional fitted parameter. The advantage of the

reduced chi-squared is that it already normalizes for the number of data points

and model complexity.

As a rule of thumb a large indicates a poor model fit.

Page 26: Logistic Regression Model

3.6 LOSS FUNCTION

A loss function is a measure of fit between a mathematical model of data and

the actual data. We choose the parameters of our model to minimize the badness-

of-fit or to maximize the goodness-of-fit of the model to the data. With least

squares (the only loss function we have used thus far), we maximize SSreg, the

sum of squares residual. This also happens to maximize SSreg, the sum of square

due to regression. With linear or curvilinear models, there is a mathematical

solution to the problem that will minimize the sum of squares, that is,

B= y

Or

Β= r

With some models, like this logistic curve, there is no mathematical solution

that will produce least squares estimates of the parameters. For these models, the

loss function chosen is called maximum likelihood. Likelihood is a conditional

probability (e.g., P(Y/X), the probability of Y given X). We can pick the

parameters of the model (a and b of the logistic curve) at random or by trial-and-

error and then compute the likelihood of the data given those parameters. We will

choose as our parameters, those that result in the greatest likelihood computed.

The estimates are called maximum likelihood because the parameters are chosen

to maximize the likelihood (conditional probability of the data given parameter

estimates) of the sample data. The techniques fall under the general label

numerical analysis. There are several methods of numerical analysis, but they all

follow a similar series of steps. First, the computer picks some initial estimates of

the parameters. Then it will compute the likelihood of the data given these

parameter estimates. Then it will improve the parameters estimates slightly and

recalculate the likelihood of the data. It will do this forever until it is told to stop,

which we usually do when the parameters estimates do not change much.

Sometimes we tell the computer to stop after a certain number of tries or

iterations, e.g., 20 or 250. This usually indicates a problem in estimation.

Page 27: Logistic Regression Model

3.7 Receiver Operating Characteristic (ROC) Curve

A measure of goodness-of-fit often used to evaluate the fit of a logistic regression

model is based on the simultaneous measure of sensitivity (True positive) and

specificity (True negative) for all possible cut-off points. First, we calculate

sensitivity and specificity pairs for each possible cut-off point and plot sensitivity

on the y axis by (1-specificity) on the x axis. This curve is called the receiver

operating characteristic (ROC) curve. The area under the ROC curve ranges from

0.5 and 1.0 with larger values indicative of better fit.

Test variables are often composed of probabilities from logistic regression. The

state variable can be the true category to which a subject belongs. The value of

the state variable indicates which category should be considered positive.

Page 28: Logistic Regression Model

CHAPTER FOUR

4.0 DATA PRESENTATION AND ANALYSIS

4.1 Data Presentation

Data used for this analysis comprised of 300 customers of Fidelity Bank Nigeria

PLC, dated 5th

of May, 2013. The data is shown in Appendix II.

4.2 Data Analysis

The analysis was carried out on SPSS using Binary Logistic Regression.

4.3 Descriptive Analysis

SPSS OUTPUT 1 previously defaulted * validate Crosstabulation

validate Total

0 1

previously

defaulted

Yes

Count 18 27 45

% within

previously

defaulted

40.0% 60.0% 100.0

%

% within validate 13.8% 18.4% 16.2

%

No

Count 112 120 232

% within

previously

defaulted

48.3% 51.7% 100.0

%

% within validate 86.2% 81.6% 83.8

%

Total

Count 130 147 277

% within

previously

defaulted

46.9% 53.1% 100.0

%

% within validate 100.0% 100.0% 100.0

%

Page 29: Logistic Regression Model

The cross tabulations also show that the modeling sample contains 120 customers

who did not default on a previous loan, and 27 who did default. The validation or

holdout sample contains 112 customers who did not default, and 18 who did.

SPSS OUTPUT 2

Descriptive Statistics

N Minimum Maximum Mean Std. Deviation

Loan 300 46000 75000000 1845374.49 4640069.418

Balance 300 .00 4899992.00 512685.6708 1168880.57401

Collateral 300 46000 100000000 2979944.81 6900862.169

Interest 300 0 234 17.82 15.735

Days 300 1 3376 478.39 386.167

Gender 300 1 2 1.46 .499

Edlev 300 1 5 3.02 1.379

Default 277 0 1 .84 .370

Valid N (listwise) 277

Previously Defaulted

Frequency Percent Valid Percent Cumulative

Percent

Valid

Yes 45 15.0 16.2 16.2

No 232 77.3 83.8 100.0

Total 277 92.3 100.0

Missing System 23 7.7

Total 300 100.0

Gender

Frequency Percent Valid Percent Cumulative

Percent

Valid

Male 163 54.3 54.3 54.3

Female 137 45.7 45.7 100.0

Total 300 100.0 100.0

Page 30: Logistic Regression Model

EDUCATION LEVEL

Frequency Percent Valid Percent Cumulative

Percent

Valid

Primary 54 18.0 18.0 18.0

Secondary 61 20.3 20.3 38.3

OND 67 22.3 22.3 60.7

HND/Bsc 60 20.0 20.0 80.7

Other Postgraduate

qualifications 58 19.3 19.3 100.0

Total 300 100.0 100.0

4.4 Logistic Regression Analysis

Block 0: Beginning Block

SPSS OUTPUT 3

Case Processing Summary

Unweighted Casesa N Percent

Selected Cases

Included in Analysis 147 49.0

Missing Cases 0 .0

Total 147 49.0

Unselected Cases 153 51.0

Total 300 100.0

a. If weight is in effect, see classification table for the total number of

cases.

We will use a random sample of 147 of these 277 customers to create a risk model. We will

set aside the remaining 130 customers as a holdout or validation sample on which to test the

credit-risk model; then use the model to classify the 23prospective customers as good or bad

credit risks.

Page 31: Logistic Regression Model

SPSS OUTPUT 4

Dependent Variable Encoding

Original Value Internal Value

Yes 0

No 1

“1” represents not defaulting (non defaulter) while “0” represents defaulting

(defaulter).

Block 0: Beginning Block

SPSS OUTPUT 5

Classification Tablea

Observed Predicted

Selected Casesb Unselected Cases

c,d

previously defaulted Percentag

e Correct

previously defaulted Percentag

e Correct Yes No Yes No

Step

1

previously defaulted Yes 0 27 0 0 12 0

No 0 120 100.0 0 118 100

Overall Percentage 81.6 86.2

a. The cut value is .500

b. Selected cases validate EQ 1

c. Unselected cases validate NE 1

d. Some of the unselected cases are not classified due to either missing values in the independent variables or

categorical variables with values out of the range of the selected cases.

The classification table shows that the model makes a correct prediction of 81.6% of the time

overall selected cases and 86.2% of the time overall selected cases.

Page 32: Logistic Regression Model

SPSS OUTPUT 6

Categorical Variables Codings

Categorical Variables Codings

Frequency Parameter coding

(1) (2) (3) (4)

Edlev

Primary 26 1.000 .000 .000 .000

Secondary 34 .000 1.000 .000 .000

OND 27 .000 .000 1.000 .000

HND/Bsc 36 .000 .000 .000 1.000

Other Postgraduate

qualifications 24 .000 .000 .000 .000

The table above shows that there are Primary (26), Secondary (34), OND (27), HND/BSC

(36) and Other Postgraduate qualifications (24) customers who obtained loans.

SPSS OUTPUT 7

Variables not in the Equationa

Score Df Sig.

Step 0 Variables

Edlev 1.730 1 .188

Loan .253 1 .615

Balance 3.874 1 .049

Collateral .438 1 .508

Days 5.338 1 .021

Gender 1.269 1 .260

a. Residual Chi-Squares are not computed because of redundancies.

SPSS OUTPUT 7 labeled Variables not in the Equation lists each of the predictors in

turn. Variables not in the Equation tells us that some of the independent variables improve

the model while some do not. Balance and Days are significant while the others are not

significant.

Page 33: Logistic Regression Model

SPSS OUTPUT 8 Variables in the Equation

B S.E. Wald Df Sig. Exp(B)

Step 0 Constant 1.492 .213 49.042 1 .000 4.444

Output 8 summaries the model (variables in the equation), at this

stage the value of the constant β0 is 1737.

(

)2 = 49.042

492.11

lnln

P

PODDS

The table above shows that the model:

1. For the selected cases: 84.4% was accurately predicted; 5 out of 27 was correctly

predicted for customers who had previously defaulted and 119 out of 120 was correctly

predicted for customers who were non-defaulters.

2. For the unselected cases: 86.9% was accurately predicted; 6 out of 18 was correctly

predicted for customers who had previously defaulted and 107 out of 112 was correctly

predicted for customers who were non-defaulters.

Block 1: Method = Enter

SPSS OUTPUT 9

Classification Tablea

Observed

Predicted

Selected Casesb Unselected Casesc,d

previously defaulted Percentage Correct previously defaulted Percentage

Correct Yes No Yes No

Step 1 previously defaulted

Yes 5 22 18.5 6 12 33.3

No 1 119 99.2 5 107 95.5

Overall Percentage

84.4

86.9

a. The cut value is .500

b. Selected cases Validate EQ 1

c. Unselected cases Validate NE 1

d. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical variables with values out of the range

of the selected cases.

Page 34: Logistic Regression Model

SPSS OUTPUT 10 Hosmer and Lemeshow Test

Step Chi-square Df Sig.

1 9.059 8 .337

Decision: The lack of significance of the Chi-Squared test indicates that the

model is a good fit since 0.337 is greater than 0.05(the level of significance).

SPSS OUTPUT 11

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Step 1a

Edlev -.189 .187 1.017 1 .313 .828

Loan .000 .000 .051 1 .821 1.000

Balance .000 .000 6.861 1 .009 1.000

Collateral .000 .000 2.430 1 .119 1.000

Interest -.107 .042 6.625 1 .010 .899

Days .002 .001 4.468 1 .035 1.002

Gender -.554 .475 1.364 1 .243 .574

Constant 4.179 1.343 9.688 1 .002 65.328

a. Variable(s) entered on step 1: Edlev, Loan, Balance, Collateral, Interest, Days, Gender.

At 0.05 level of significance, Balance, Interest and Days are highly significant

but Education Level, Loan, Collateral and Gender are not significant by Wald

Statistics. The test of the intercept (i.e constant) suggested that merely suggests

whether an intercept should be included in the model. For the present data set, the

test result (p<0.05) suggests that model with intercept should be applied to the

data.

Page 35: Logistic Regression Model

The coefficients estimates are used to estimate the probability of not defaulting is

as follows:

P(Y =1/X) =

Y = β0 +β1x1 +β2x2 + β3 x3 + β4 x4 +………..+ β n xn

Hence:

ln

= Y= 4.179– 0.189 x1 +0.00x2 +0.00x3 + 0.00 x4 – 0.107 x5 + 0.02 x6 –

0.554x7

Logit(Y)=4.179–0.189EDUCATIONLEVEL+0.00LOAN +0.00COLLATERAL

+ 0.00BALANCE – 0.107INTEREST + 0.02DAYS –0.554GENDER

As with any regression, the positive coefficients indicate a positive relationship

with the dependent variables.

4.5 The odds Ratio Results

SPSS OUTPUT 13 The following odds ratios were calculated using the formula;

=

For every covariate used in the study,

Odds

Ratio

95% C.I.for

EXP(B)

Lower Upper

Step 1a

Edlev .828 .573 1.195

Loan 1.000 1.000 1.000

Balance 1.000 1.000 1.000

Collateral 1.000 1.000 1.000

Interest .899 .828 .975

Days 1.002 1.000 1.003

Gender .574 .227 1.456

Constant 65.328

Page 36: Logistic Regression Model

• Men are 1.456 times more likely to default than for women to default

4.6 Model Assessment

SPSS OUTPUT 13

Block 1: Method = Forward Stepwise (Likelihood Ratio)

Model Summary

Step -2 Log likelihood Cox & Snell R

Square

Nagelkerke R

Square

1 134.478a .038 .062

2 128.330a .078 .126

3 123.998b .104 .170

a. Estimation terminated at iteration number 5 because

parameter estimates changed by less than .001.

b. Estimation terminated at iteration number 6 because

parameter estimates changed by less than .001.

The Nagelkerke statistic in the far right hand column represents a good approximation to that

statistic, having a maximum possible value of 1.00. It shows that approximately 17% of the

variation in the dependent variable is explained by the three predictors in our final model.

SPSS OUTPUT 15

Variables in the Equation

B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)

Lower Upper

Step

1a

Days .002 .001 5.118 1 .024 1.002 1.000 1.003

Constant .822 .337 5.947 1 .015 2.274

Step

2b

Interest -.068 .029 5.346 1 .021 .934 .882 .990

Days .002 .001 7.869 1 .005 1.002 1.001 1.004

Constant 1.815 .579 9.833 1 .002 6.143

Step

3c

Balance .000 .000 4.216 1 .040 1.000 1.000 1.000

Interest -.106 .040 7.147 1 .008 .900 .832 .972

Days .002 .001 4.888 1 .027 1.002 1.000 1.003

Constant 2.966 .901 10.827 1 .001 19.415

a. Variable(s) entered on step 1: Days.

b. Variable(s) entered on step 2: Interest.

Page 37: Logistic Regression Model

c. Variable(s) entered on step 3: Balance.

Output 15 shows that our stepwise model-building process included three steps. In

the first step, a constant as well as the Days predictor variable are entered into the

model. At the second step Interest is added to the model. And the final step adds

Balance.

This confirms the three predictor variables that were previously stated as being

significant.

The “B” column shows the coefficients (called Beta Coefficients, abbreviated with

a “B”) associated with each predictor. We see that Interest has negative

coefficients, indicating that customers who have less time are somewhat more

likely to default on a loan. Balance and Days whose coefficients are positive show

that customers who have more Balance and Days are associated with a greater

likelihood of defaulting on a loan.

SPSS OUTPUT 15

Omnibus Tests of Model Coefficients

Chi-square Df Sig.

Step 1

Step 5.736 1 .017

Block 5.736 1 .017

Model 5.736 1 .017

Step 2

Step 6.148 1 .013

Block 11.884 2 .003

Model 11.884 2 .003

Step 3

Step 4.332 1 .037

Block 16.216 3 .001

Model 16.216 3 .001

Overall Chi-square test

H1: βi = 0 for all i

H2: βi 0 for at least 1 coefficient

H1 is rejected since p-value < 0.05 in all the three steps

Hence the model is significant.

Page 38: Logistic Regression Model

4.7 Classification and Validation

SPSS OUTPUT 16 Classification Table

a

Observed Predicted

Selected Casesb Unselected Cases

c,d

previously defaulted Percentage

Correct

previously defaulted Percentage

Correct Yes No Yes No

Step

1

previously defaulted Yes 0 27 .0 0 18 .0

No 0 120 100.0 0 112 100.0

Overall Percentage 81.6 86.2

Step

2

previously defaulted Yes 0 27 .0 0 18 .0

No 1 119 99.2 3 109 97.3

Overall Percentage 81.0 83.8

Step

3

previously defaulted Yes 1 26 3.7 1 17 5.6

No 1 119 99.2 3 109 97.3

Overall Percentage 81.6 84.6

a. The cut value is .500

b. Selected cases validate EQ 1

c. Unselected cases validate NE 1

d. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical

variables with values out of the range of the selected cases.

Output 16 clearly shows that the model correctly classified about 99.2% of the modeling

sample’s non-defaulters and about 4% of the modeling sample’s defaulters, for an overall

correct classification percentage of about 82%. Similarly, when applied to the holdout or

validation sample, the model correctly identified about 97% of the non-defaulters and about

6% of the defaulters, for an overall correct classification percentage of about 85%.

Page 39: Logistic Regression Model

SPSS OUTPUT 17

SPSS OUTPUT 17 is our modeling graph the right hand side the modeling process assigned

the bulk of the actual non-defaulters very low probabilities of defaulting And the left hand

graph shows that the model assigned the bulk of the defaulters very high probabilities of

defaulting. So this adds more confirmation that we have a good model.

Since we have a valid predictive model, we can use it to score a prospect file. The graph

below shows the result after we have scored our 23 prospects.

It shows that all the prospects would not be expected to default on a loan.

Page 40: Logistic Regression Model

4.8 Receiver Operating Characteristic (ROC) Curve

drCase Processing Summary

previously

defaulted

Valid N (listwise)

Positivea 232

Negative 45

Missing 23

Larger values of the test result variable(s) indicate stronger evidence for a positive

actual state.

a. The positive actual state is No.

The further the curve lies above the reference line, the more accurate the test.

Here, the curve lies further well enough from the reference line.

Page 41: Logistic Regression Model

Area Under the Curve

Test Result Variable(s): Predicted probability

Area Std. Errora Asymptotic Sig.

b Asymptotic 95% Confidence

Interval

Lower Bound Upper Bound

.765 .040 .000 .687 .844

The test result variable(s): Predicted probability has at least one tie between

the positive actual state group and the negative actual state group. Statistics

may be biased.

a. Under the nonparametric assumption

b. Null hypothesis: true area = 0.5

The area under the curve is .765 with 95% confidence interval (.687, 844). Also,

the area under the curve is significantly different from 0.5 since p-value is .000

meaning that the logistic regression classifies the group significantly better than

by chance.

Page 42: Logistic Regression Model

CHAPTER FIVE

5.0 Summary of Findings, Conclusion and Recommendation

5.1 Summary of Findings and Conclusion

In this study, some customer’s accounts of Fidelity Bank PLC as at 5th

of May,

2013 were examined using Binary Logistic Regression and a model built for

lenders. We have built a model which lenders at the bank will use to predict the

probability that a potential loanee will default or not. It examines the dependent

variable: dichotomous outcome (default) by using the independent variables

(Loans, Balance, Collateral, Interest, Number of Days, gender and Education

level) which are either continuous or categorical variables; we have demonstrated the

use of risk modeling using logistic regression analysis to identify demographic and

behavioral characteristics associated with likelihood to default on a bank loan. Significance

testing using Wald test and likelihood ratio showed that at 5% level of

significance, Balance, Interest and Days are highly significant; but Education

Level, Loan, Collateral and Gender are not significant by Wald Statistics.

Also, the area under the ROC curve is significantly different from 0.5 since p-

value is .000 meaning that the logistic regression classifies the group significantly

better than by chance.

Thus, this model can be used to predict the probability that a given customer who

obtain loan will default or not.

In conclusion, the model has shown that lenders should always put Balance, Days

and Interest into consideration before given out loans.

Page 43: Logistic Regression Model

5.2 Recommendations

The researcher recommends the following:

1. The model of this research is highly recommended for the bank.

2. Too much money should not be granted to customers for only few days

because such customers might find it difficult to pay back before deadline.

3. Customers who obtained loans should be reminded their due date whenever

the deadline is near in order to prompt payment.

4. The character of a customer should be really considered if he/she is the type

that is addicted to defaulting on loans.

5. Lenders should put the Interest rate, Balance, and Days into consideration

before given out loans.

Page 44: Logistic Regression Model

REFERENCES

1. Ainsworth, Logistic Regression

2. Altman, E.I.; Edward, I.; Haldeman, R.; Narayanan, P. A New Model to

Identify Bankruptcy risk of corporation. Journal of Banking and Finance,

1977, 1, 29–54.

3. Amr I. Abdelrahman, Applying Logistic Regression Model to The Second

Primary Cancer Data;Department of Statistics, Mathematics, and

Insurance. Faculty of Commerce, Ain Shams University, Egypt.

4. Aziz, A.; Emanuel, D.; Lawson, G. Bankruptcy Prediction – An investigation

of cash flow based models. Journal of Management Studies, 1998, 25, 419–

437.

5. Bogess, W.B., 1967. Screen-test your Credit Risk. Journal of Harvard

Business Review. Volume 45, pp 21-113.

6. Cramer J.S. (2003): The Origin and development of logit Model, Cambridge

University Press: Cambridge.

7. Hand, D.J (2010): Modeling Consumer Risk, IMA Journal of Management

Mathematics, 12,137-255

8. Karl L. Wuensch, Dept of Psycholog, East Carolina University, Binary

Logistic Regression with SPSS/PASW

9. Menard, S.(1995). Applied Logistic Regression Analysis, Sage Publication,

New Bury Park: Carlifornia

10. Mogboyin, O., T.O. Asaolu and O.T. Ajilore, 2012. Bank Consolidation

Program and Lending Performance in Nigerian Banking System: An

Empirical Analysis with Panel Data. The International Journal of Applied

Economics and Finance, 6: 100-108.

11. Pompe, P.P.M.; Bilderbe, J. The Prediction of Bankruptcy of Small- and

Medium-sized Industrial Firms. Journal of Business Venturing, 2005, 20,

847–868.

12. www.smalldrill.com/logistic-regression.html

13. www.wikipeadia.com

Page 45: Logistic Regression Model

APPENDIX 1

Loans Balance Collateral Interest Days gender Ed Lev Default Validate PRE_1 PGR_1 COO_1

192000 0 320000 28 808 1 1 1 1 0.87567 1 0.00583

384000 0 64000 23 682 2 5 1 0 0.70441 1 0.02797

5000000 4625600 15000000 6 39 1 3 0 1 0.81977 1 0.91138

350000 0 350000 28 1625 2 4 1 1 0.90688 1 0.01069

2250000 2063929 6600000 6 77 1 2 1 1 0.9074 1 0.00392

384000 2000.15 640000 28 694 2 3 0 0 0.70889 1 0.11539

1000000 0 2100000 27 938 1 1 1 0 0.93834 1 0.00189

5000000 3751320 13600000 6 77 1 5 1 1 0.84765 1 0.02331

160000 0 320000 28 871 2 3 1 0 0.75648 1 0.01825

272000 0 320000 23 983 1 4 1 1 0.90235 1 0.00355

380000 0 380000 28 1225 2 4 0 0 0.82823 1 0.43932

5039650 0 5056723 6 435 1 4 1 0 0.98268 1 0.00045

787500 734915.2 15000000 6 10 2 1 1 1 0.99622 1 0.0001

405000 3198582 6000000 6 133 1 1 1 1 0.79734 1 0.04591

272000 0 320000 23 730 2 3 1 0 0.80374 1 0.0075

4952187 0 4952187 6 532 2 5 1 1 0.96906 1 0.00142

4768904 0 4768904 6 10 1 5 1 0 0.95415 1 0.00295

787500 190671.9 15000000 6 10 1 3 1 0 0.99818 1 0.00002

678888 740847.4 12000000 6 10 1 2 1 1 0.99403 1 0.00015

4050000 3654383 10000000 6 9 2 4 0 1 0.60024 1 0.22482

3000000 2763470 4500000 6 10 1 2 1 1 0.6934 1 0.05286

5039650 18091.7 5039660 6 8 2 1 1 1 0.96342 1 0.0026

192000 0 320000 23 983 1 3 1 0 0.91821 1 0.00216

192000 0 320000 23 703 2 4 1 0 0.76467 1 0.01261

787500 746667.4 1500000 6 213 1 3 0 0 0.91574 1 0.35998

300000 0 600000 17 897 1 5 1 1 0.93073 1 0.00253

4500000 0 5500000 25 930 2 2 1 1 0.94632 1 0.0039

200000 0 600000 36 149 2 4 1 0 0.24676 0 0.39715

1750000 127059 5000000 25 72 1 5 1 1 0.77888 1 0.03178

2560000 0 3000000 28 118 1 2 1 0 0.75243 1 0.03596

3000000 0 15000000 30 633 1 3

0.99254 1 192000 0 500000 0 1105 2 3 1 0 0.98991 1 0.00015

192000 0 320000 23 875 1 1 1 1 0.93118 1 0.00178

100000 0 100000 0 534 2 2 1 0 0.97491 1 0.00067

50000 0 50000 0 546 1 5 1 1 0.97486 1 0.0008

125000 0 125000 0 722 2 3 1 0 0.97831 1 0.00053

1440000 1554769 1440000 6 343 1 1 1 1 0.89122 1 0.00685

5000000 935319.3 10000000 6 133 1 5 1 0 0.97624 1 0.00074

700000 0 700000 4 507 1 3 1 1 0.97523 1 0.00045

5000000 4555817 10000000 6 337 1 4 0 0 0.63438 1 0.4135

300000 0 600000 17 897 2 2 0 1 0.93156 1 0.34938

5000000 4630478 10000000 6 10 1 1 1 0 0.61348 1 0.18744

5000000 4897826 10000000 6 99 2 3 1 1 0.3573 0 0.53502

Page 46: Logistic Regression Model

384000 0 384000 23 771 1 4 1 1 0.86501 1 0.00437

300000 5923.08 300000 23 686 2 3 1 0 0.7889 1 0.00789

800000 0 800000 21 701 1 2 0 1 0.91764 1 0.18564

5290221 0 5500000 21 765 1 1 1 0 0.9757 1 0.00107

300000 0 300000 23 633 2 4 1 1 0.73911 1 0.01345

600000 0 300000 23 393 2 5 1 0 0.6001 1 0.04144

46000 0 46000 23 1065 1 5 1 0 0.89286 1 0.00674

500000 0 500000 23 350 1 2 1 1 0.81953 1 0.00582

180000 0 180000 23 864 2 1

0.88011 1 500000 0 500000 21 832 1 1 1 1 0.94105 1 0.0013

2520000 2358182 2520000 6 9 1 3 1 1 0.62953 1 0.09056

1440000 1372656 1440000 6 15 1 4 1 0 0.75794 1 0.02856

1050000 801584.6 1050000 6 34 2 2 1 1 0.81831 1 0.01554

2880000 1968066 2880000 6 15 1 5 1 0 0.65406 1 0.08158

1440000 1207576 1440000 6 34 2 3 0 1 0.72696 1 0.20884

1440000 1307209 1440000 6 15 1 2 1 0 0.83016 1 0.01357

1440000 1318354 1440000 6 15 2 5 1 1 0.61152 1 0.08602

980000 707375.7 980000 6 34 2 3

0.80189 1 1440000 1099316 1440000 6 34 2 1 1 1 0.81283 1 0.02011

1440000 1099316 1440000 6 34 1 1 0 0 0.88317 1 0.45319

1440000 1272962 1440000 6 15 1 5 1 0 0.74166 1 0.04063

1440000 1347237 1440000 6 15 1 4 1 1 0.76269 1 0.02777

1440000 1470634 1440000 6 337 1 3 1 0 0.85824 1 0.00879

1440000 1339071 1440000 6 15 2 4

0.65057 1 1440000 1307209 1440000 6 15 2 5

0.61424 1

2880000 2621569 2880000 6 160 1 2

0.68796 1 2880000 2504823 2880000 6 164 2 4

0.49625 0

2880000 1521562 2880000 6 652 1 3

0.93103 1 2880000 2678142 2880000 6 15 2 2

0.4803 0

2880000 2860578 2880000 6 71 1 2

0.59569 1 787500 746667.4 1500000 6 213 2 4 1 1 0.83786 1 0.0119

300000 0 600000 17 897 1 1 1 0 0.96625 1 0.00055

4500000 0 5500000 25 930 2 1 1 0 0.95515 1 0.00316

200000 0 600000 36 149 1 1 1 1 0.50141 1 0.21966

1750000 127059 5000000 25 72 2 4 0 1 0.70969 1 0.32739

2560000 0 3000000 28 118 1 3 1 1 0.71556 1 0.03978

3000000 0 15000000 30 633 2 4 1 0 0.98443 1 0.0012

192000 0 320000 23 875 2 2 1 0 0.86547 1 0.00555

192000 0 320000 23 615 1 3 1 1 0.85394 1 0.00328

192000 0 320000 23 688 1 5 1 0 0.82011 1 0.01103

192000 0 192000 23 771 1 1 1 1 0.91571 1 0.00232

1533333 0 1000000 0 617 2 1 1 0 0.98438 1 0.00031

400000 0 400000 0 562 2 2 1 1 0.97749 1 0.00054

125000 0 125000 0 722 1 2 1 0 0.98957 1 0.00013

100000 0 100000 0 533 1 2 1 1 0.9854 1 0.00023

100000 0 100000 0 533 1 3 1 0 0.98242 1 0.00033

Page 47: Logistic Regression Model

50000 0 50000 0 546 2 2 1 1 0.97518 1 0.00066

50000 0 50000 0 534 1 4 1 0 0.97866 1 0.00051

50000 0 50000 0 534 2 4 1 1 0.96343 1 0.00146

100000 0 100000 0 534 1 1 1 1 0.98791 1 0.00017

192000 0 500000 0 1105 2 2 1 1 0.99164 1 0.00011

5329969 0 6000000 6 343 1 4 1 1 0.98393 1 0.00041

5000000 4723548 10000000 6 142 2 4

0.37259 0 5000000 4676762 10000000 6 1 1 2

0.55217 1

5000000 4899992 10000000 6 34 1 1

0.55674 1 5000000 4318165 10000000 6 161 2 5

0.43524 0

5000000 4348778 10000000 6 8 1 3

0.5913 1 700000 0 700000 4 507 2 5 1 1 0.93938 1 0.00314

1440000 1554769 1440000 6 343 1 5

0.79362 1 2295000 0 2295000 6 547 2 3 0 1 0.96455 1 0.6775

5000000 4555817 10000000 6 337 1 2

0.71691 1 5000000 935319.3 10000000 6 133 1 1 1 0 0.98871 1 0.00022

5000000 4451327 10000000 5 41 1 4 1 1 0.5598 1 0.16949

350000 0 350000 28 1624 2 5 1 0 0.88946 1 0.01681

2500000 0 2555555 30 1226 1 4 1 1 0.9142 1 0.00619

4483999 0 8483999 30 724 1 3 1 1 0.95918 1 0.00231

4875000 0 6500000 30 701 2 1 1 0 0.9143 1 0.01356

2388661 0 5000000 30 1570 2 2 1 0 0.9701 1 0.00137

160000 0 320000 28 1381 2 3 1 1 0.88469 1 0.01055

192000 0 320000 28 871 1 4 1 1 0.81705 1 0.01093

384000 0 384000 28 119 2 2 1 0 0.49785 0 0.09249

4133285 0 2388661 21 197 1 5 1 1 0.76035 1 0.06145

192000 0 320000 28 666 2 2 1 0 0.7225 1 0.02047

192000 0 320000 28 989 1 3 1 1 0.8693 1 0.00592

192000 0 320000 28 806 2 4 1 0 0.69571 1 0.02903

262515 0 640000 28 938 2 5 1 1 0.7221 1 0.04006

192000 0 320000 28 722 1 2 1 0 0.83348 1 0.00679

192000 0 320000 28 808 1 1 1 1 0.87567 1 0.00583

384000 1938.19 640000 28 694 2 5 0 0 0.62525 1 0.14573

192000 0 320000 28 1014 1 2 1 1 0.89361 1 0.00439

4050000 3861301 8500000 6 3376 2 1

0.9982 1 5000000 3751320 13600000 6 8 1 3

0.87784 1

2250000 2063929 6600000 6 77 2 4

0.79409 1 3600000 0 9200000 6 1408 2 5 1 0 0.99809 1 0.00001

5000000 4625600 15000000 6 39 1 5 1 1 0.75706 1 0.08535

384000 0 640000 23 682 1 1 1 1 0.91193 1 0.00234

384000 0 640000 23 682 2 4 1 1 0.77133 1 0.01109

5000000 3955074 17000000 6 72 1 5 1 1 0.91931 1 0.01389

4000000 3844555 9600000 67 314 2 1 1 0 0.00495 0 7.26961

1500000 0 2700000 28 1353 1 3 1 0 0.95703 1 0.00141

1500000 0 2700000 30 633 2 4 1 1 0.70471 1 0.02986

1000000 0 2100000 27 938 1 5 1 0 0.87719 1 0.00676

Page 48: Logistic Regression Model

192000 0 320000 23 967 2 1 1 1 0.90147 1 0.00463

192000 0 320000 23 825 1 4 1 1 0.87534 1 0.00428

2520000 0 2520000 6 961 1 2 1 1 0.99205 1 0.00007

2520000 0 2520000 6 377 1 1 1 0 0.98167 1 0.00033

300000 0 300000 23 388 2 4 1 0 0.64725 1 0.02172

200000 0 200000 18 861 2 5 1 0 0.85446 1 0.01014

200000 0 200000 21 919 1 4 1 1 0.90853 1 0.00291

1000000 0 1000000 23 681 1 3

0.88221 1 945857 1116760 823000 23 314 2 1 0 1 0.49896 0 0.11353

900523 804888.1 922000 23 314 1 4 0 1 0.58257 1 0.08205

850000 25167.92 850000 23 69 2 5 0 0 0.48487 0 0.08066

450000 0 450000 23 540 2 3 1 0 0.74957 1 0.00872

382000 0 3820000 27 938 1 5 1 1 0.92287 1 0.00451

368600 237983.6 350000 23 38 2 3 0 0 0.48509 0 0.06221

282235 0 445000 23 178 1 5 1 0 0.65499 1 0.04302

248500 0 248500 23 279 2 3 1 1 0.64388 1 0.02117

240000 0 240000 21 559 1 1 1 0 0.90318 1 0.00261

800000 0 800000 23 176 2 2 1 1 0.67084 1 0.02651

463500 0 463500 23 315 1 5 1 1 0.70607 1 0.026

2553610 0 2553610 6 364 1 4 1 0 0.96762 1 0.00078

2722619 0 2722619 6 532 2 3 1 0 0.96658 1 0.0009

600000 0 600000 23 287 1 2 1 1 0.80565 1 0.00733

1000000 0 1000000 24 906 2 4 1 0 0.82667 1 0.00954

1268000 1314449 1127725 23 389 1 1 0 0 0.63218 1 0.19487

763430 922383.4 702475 23 682 2 2 1 1 0.65434 1 0.04021

350000 0 350000 18 938 1 3 0 1 0.94633 1 0.32025

362140 263843.6 350000 23 246 2 4 0 1 0.52346 1 0.05726

191170 167574.7 195000 23 296 2 5 1 1 0.51538 1 0.07663

840000 0 840000 23 223 1 1 1 0 0.82447 1 0.01057

441000 0 441000 18 902 2 2 1 0 0.92125 1 0.00232

500000 0 500000 21 744 1 3 1 1 0.90345 1 0.00182

240000 0 240000 21 212 1 3 1 0 0.77551 1 0.0094

240000 0 240000 21 212 2 2 1 1 0.70567 1 0.01943

240000 0 240000 21 510 1 3 1 0 0.85421 1 0.00305

4700000 0 4700000 21 633 2 4 1 1 0.89644 1 0.01122

2400000 2938.74 2400000 23 273 1 1

0.87575 1 300000 0 300000 19 746 2 5 1 1 0.81455 1 0.01325

445000 0 445000 23 531 1 3 1 0 0.83667 1 0.00355

310000 0 310000 21 531 2 4 1 0 0.74577 1 0.01148

500000 0 500000 21 996 1 5 1 0 0.90927 1 0.00422

217200 0 382000 234 206 2 2 1 0 0 0 78.40688

500000 0 500000 0 701 1 1 1 1 0.99169 1 0.00009

660000 0 660000 21 834 2 4 1 0 0.84359 1 0.00681

13500000 0 13500000 19 967 1 3 1 1 0.99599 1 0.00015

382000 0 282000 23 162 2 5 1 0 0.50165 1 0.08282

400000 0 400000 19 765 1 3 1 0 0.92176 1 0.00142

Page 49: Logistic Regression Model

254000 377459.3 254000 23 891 1 2 0 1 0.88436 1 0.24587

346940 0 454000 23 100 2 4 1 1 0.53382 1 0.05824

368500 0 368500 23 6 1 2 1 0 0.70606 1 0.02929

599850 0 599850 30 633 1 1 1 1 0.81408 1 0.01361

1000000 51512.48 1000000 23 101 2 5 0 0 0.49999 0 0.07952

467360 564662 450000 23 891 1 3 0 1 0.84452 1 0.22968

5000000 3436988 5000000 6 69 2 2 1 0 0.41906 0 0.58271

1080000 305370.9 1080000 23 9 1 4 0 1 0.58326 1 0.10135

392000 0 392000 27 688 2 3 1 0 0.71494 1 0.01682

413000 0 413000 23 526 1 4 1 1 0.80677 1 0.00647

310000 0 310000 18 935 2 3 1 0 0.90904 1 0.00302

279930 319602.3 310000 23 519 1 4 0 1 0.74454 1 0.11165

354570 354900.3 321000 23 322 2 3 0 0 0.57855 1 0.054

300000 0 300000 18 765 1 2 1 1 0.93946 1 0.00102

236000 0 236000 23 449 2 1 1 1 0.78061 1 0.01442

300000 0 300000 19 800 1 3 1 0 0.92471 1 0.00143

996900 1100884 823000 23 195 1 4 0 0 0.44638 0 0.07141

2510096 0 1032000 23 287 2 5 1 1 0.57107 1 0.08519

300000 292.55 300000 21 393 2 3 0 1 0.73464 1 0.07573

1153298 789.49 1200000 21 150 1 2 0 1 0.8203 1 0.17397

1030000 0 1030000 18 760 2 2 1 1 0.91122 1 0.00245

289940 36459.25 500000 21 223 1 5 0 1 0.71333 1 0.17441

480000 0 480000 21 519 2 1 1 1 0.83984 1 0.00773

240000 0 240000 21 212 1 2 1 1 0.80671 1 0.00823

240000 0 240000 21 212 1 5 1 0 0.70298 1 0.03074

240000 0 240000 21 490 1 3 1 0 0.84974 1 0.00322

465000 0 465000 23 56 2 4 1 0 0.51307 1 0.06568

148870 0 450000 23 434 2 3 1 0 0.71695 1 0.01254

4000000 0 4000000 19 877 1 2 1 0 0.97323 1 0.00076

812000 51070.1 812000 23 55 2 3 0 0 0.56425 1 0.07641

463500 0 463500 23 197 1 5 1 0 0.66087 1 0.03849

940905 0 940905 23 213 2 2 1 0 0.69136 1 0.02211

240000 0 240000 21 212 1 2 1 1 0.80671 1 0.00823

480000 0 480000 21 212 2 1 1 0 0.75265 1 0.02091

240000 0 240000 21 510 2 4 1 1 0.73587 1 0.01264

240000 0 240000 21 492 2 5 1 0 0.69074 1 0.02775

72000 0 72000 4 441 1 3 1 1 0.96854 1 0.00074

2934000 0 2934000 6 532 2 2 1 0 0.97333 1 0.00067

480000 0 480000 21 212 1 5 1 0 0.71317 1 0.02661

5000000 0 5000000 21 883 1 4 1 1 0.96148 1 0.00195

400000 0 400000 21 175 2 2 1 1 0.69883 1 0.02121

286000 0 286000 18 864 2 1 1 1 0.92753 1 0.0024

2750000 0 2750000 6 427 1 5 1 1 0.96644 1 0.00096

400000 0 400000 21 393 2 3 1 1 0.73868 1 0.00929

1500000 0 1500000 4 526 1 2 1 1 0.98304 1 0.00024

650000 0 650000 23 567 1 2 1 1 0.8731 1 0.00281

Page 50: Logistic Regression Model

4725000 0 4725000 6 756 2 4 1 0 0.98172 1 0.00051

4725000 0 4725000 6 756 1 2 1 0 0.99273 1 0.00009

4725000 10011.88 4725000 6 756 2 2

0.98728 1 4725000 0 4725000 6 756 1 2 1 0 0.99273 1 0.00009

4725000 0 4725000 6 756 2 5 1 1 0.978 1 0.00075

2520000 0 2520000 6 961 2 1 1 0 0.98859 1 0.00017

2520000 0 2520000 6 377 1 4 1 1 0.96812 1 0.00075

300000 0 300000 23 388 1 5 1 0 0.72556 1 0.02257

200000 0 200000 18 861 1 4 1 1 0.92508 1 0.00202

200000 0 200000 21 919 2 2 1 0 0.8928 1 0.00394

1000000 0 1000000 23 681 1 1 1 0 0.91619 1 0.00236

945850 1116760 823000 23 314 2 1 0 1 0.49896 0 0.11353

900520 804888.1 922000 23 314 2 1 0 0 0.58571 1 0.12401

850000 25167.92 850000 23 69 1 3 0 1 0.70515 1 0.12134

450000 0 450000 23 540 2 4 1 0 0.71244 1 0.01396

382000 0 382000 27 938 1 2 1 1 0.8913 1 0.00388

368610 237983.6 350000 23 38 2 3 0 0 0.48509 0 0.06221

382230 0 445000 23 178 1 4 1 1 0.69491 1 0.02206

248500 0 248500 23 279 2 5 1 0 0.55332 1 0.06149

240000 0 240000 21 559 1 2 1 0 0.88534 1 0.00233

800000 0 800000 23 176 2 3 1 1 0.62782 1 0.02569

463500 0 463500 23 315 1 1 1 1 0.83654 1 0.008

2553610 0 2553610 6 364 1 5 1 0 0.96114 1 0.00122

2722600 0 2722619 6 532 2 2 1 1 0.97218 1 0.00069

600000 0 600000 23 287 1 3 1 0 0.77433 1 0.00773

1000000 0 1000000 24 906 1 4 1 1 0.8925 1 0.00357

1268000 1314449 1127725 23 389 2 5 0 1 0.31668 0 0.05142

763440 922383.6 702475 23 682 1 5 0 0 0.65141 1 0.20369

350000 0 350000 18 938 2 2 1 1 0.92446 1 0.00229

362149 263843.6 350000 23 246 1 4 0 1 0.65661 1 0.09467

191100 167574.7 195000 23 296 2 3 0 1 0.60819 1 0.06204

840000 0 840000 23 223 1 3 1 0 0.76292 1 0.00959

441000 0 441000 18 902 2 1 1 1 0.93392 1 0.00212

500000 0 500000 21 744 1 1 1 0 0.93178 1 0.00155

240000 0 240000 21 212 1 4 1 1 0.74089 1 0.01545

240000 0 240000 21 212 2 5 1 0 0.57621 1 0.05803

240000 0 240000 21 510 1 2 1 1 0.87622 1 0.00265

4700000 0 4700000 21 633 1 3 1 0 0.94793 1 0.00294

5529600 242895 5529603 23 160 2 5 0 1 0.69805 1 0.75065

125700 0 193203 18 430 2 3 1 0 0.80018 1 0.00699

480000 0 480000 21 212 1 2 1 1 0.81428 1 0.00741

300000 0 300000 19 332 2 4 1 0 0.71809 1 0.01503

4200000 0 11200000 4 758 1 1 1 1 0.99921 1 0

1800000 0 3000000 23 266 2 3 1 1 0.77164 1 0.01396

560000 0 560000 4 55 1 1 1 1 0.96162 1 0.00132

5084000 0 7000000 16 51 1 5 1 1 0.93285 1 0.00559

Page 51: Logistic Regression Model

6355000 0 7000000 16 427 2 3 1 0 0.95406 1 0.00417

4500000 0 4500000 22 160 1 2 1 0 0.89132 1 0.01355

5000000 0 7000000 16 713 2 1 1 0 0.98224 1 0.00056

5080000 3279922 5080000 16 44 1 5 0 0 0.21849 0 0.08556

436000 0 436000 23 162 1 3 1 1 0.72664 1 0.01522

350000 249266.7 350000 23 100 2 4 0 1 0.46279 0 0.05757

760000 0 760000 4 91 2 2 1 1 0.92974 1 0.0037

250000 0 250000 23 491 1 1 1 0 0.86999 1 0.00458

4600000 0 10000000 23 175 1 1 1 0 0.9763 1 0.00119

5000000 361289.7 5000000 23 273 2 4 0 0 0.73051 1 0.57461

2450000 0 3500000 23 17 1 3 1 0 0.80576 1 0.01663

1000000 0 10800000 23 440 2 1 1 1 0.98377 1 0.00098

500000 0 500000 23 521 2 5 1 1 0.66702 1 0.0302

800000 0 800000 23 293 1 1 1 0 0.84061 1 0.00805

450000 563219.8 450000 23 273 2 3 0 1 0.51125 1 0.04975

8000000 0 22000000 23 246 1 1 1 0 0.999 1 0.00001

4500000 0 22000000 23 62 2 4 1 1 0.99667 1 0.00011

4500000 0 18000000 23 356 1 5 1 0 0.99589 1 0.0001

75000000 0 1E+08 21 365 2 3 1 1 1 1 0

150000 0 150000 4 55 1 4 1 1 0.92889 1 0.00425

187300 0 200000 4 562 2 5 1 0 0.93914 1 0.00342

400000 0 400000 0 562 1 2 1 1 0.98694 1 0.00018

50000 0 50000 0 542 1 3 1 0 0.98252 1 0.00033

1000000 0 2000000 23 540 1 4 1 1 0.86406 1 0.00332

5080000 0 5080000 16 371 2 5 1 0 0.89252 1 0.01352

4132335 3987054 4132335 16 14 1 5 0 1 0.09553 0 0.01992

660000 0 660000 21 834 1 3 1 1 0.91898 1 0.00157

Page 52: Logistic Regression Model

DATASET ACTIVATE DataSet2.

GET

FILE='C:\Users\Dr. Faith Adebisi\Documents\PETER SPSS.sav'.

DATASET NAME DataSet5 WINDOW=FRONT.

GET

FILE='C:\Users\Dr. Faith Adebisi\Documents\Music\Desktop\New folder (3)\DR ADAMU

SPSS. 23.sav'.

DATASET NAME DataSet6 WINDOW=FRONT.

GRAPH

/HISTOGRAM=PRE_1

/PANEL COLVAR=Default COLOP=CROSS.

Page 53: Logistic Regression Model

CROSSTABS

/TABLES=Default BY validate

/FORMAT=AVALUE TABLES

/CELLS=COUNT ROW COLUMN

/COUNT ROUND CELL.

Crosstabs

Case Processing Summary

Cases

Valid Missing Total

N Percent N Percent N Percent

previously defaulted *

validate 277 92.3% 23 7.7% 300 100.0%

Page 54: Logistic Regression Model

previously defaulted * validate Crosstabulation

Validate Total

0 1

previously defaulted

Yes

Count 18 27 45

% within previously

defaulted 40.0% 60.0% 100.0%

% within validate 13.8% 18.4% 16.2%

No

Count 112 120 232

% within previously

defaulted 48.3% 51.7% 100.0%

% within validate 86.2% 81.6% 83.8%

Total

Count 130 147 277

% within previously

defaulted 46.9% 53.1% 100.0%

% within validate 100.0% 100.0% 100.0%

ROC Curve

Case Processing Summary

previously defaulted Valid N

(listwise)

Positivea 232

Negative 45

Missing 23

Larger values of the test result

variable(s) indicate stronger evidence for

a positive actual state.

a. The positive actual state is No.

Page 55: Logistic Regression Model

Area Under the Curve

Test Result Variable(s): Predicted probability

Area Std. Errora Asymptotic Sig.

b Asymptotic 95% Confidence

Interval

Lower Bound Upper Bound

.765 .040 .000 .687 .844

The test result variable(s): Predicted probability has at least one tie between the

positive actual state group and the negative actual state group. Statistics may be

biased.

a. Under the nonparametric assumption

b. Null hypothesis: true area = 0.5

LOGISTIC REGRESSION VARIABLES Default

/SELECT=validate EQ 1

/METHOD=FSTEP(LR) Edlev Loan Balance Collateral Interest Days Gender

/SAVE=PRED PGROUP COOK

/CLASSPLOT

/PRINT=GOODFIT CI(95)

/CRITERIA=PIN(0.05) POUT(0.10) ITERATE(20) CUT(0.5)

Logistic Regression

[DataSet1] C:\Users\Dr. Faith Adebisi\Documents\Music\Desktop\New folder (3)\DR ADAMU

SPSS. 2.sav

Page 56: Logistic Regression Model

Case Processing Summary

Unweighted Casesa N Percent

Selected Cases

Included in Analysis 147 49.0

Missing Cases 0 .0

Total 147 49.0

Unselected Cases 153 51.0

Total 300 100.0

a. If weight is in effect, see classification table for the total number of

cases.

Dependent Variable Encoding

Original Value Internal Value

Yes 0

No 1

Block 0: Beginning Block

Classification Tablea,b

Observed Predicted

Selected Casesc Unselected Cases

d,e

previously defaulted Percentage

Correct

previously defaulted Percent

age

Correct

Yes No Yes No

Step 0 previously defaulted

Yes 0 27 .0 0 18 .0

No 0 120 100.0 0 112 100.0

Overall Percentage 81.6 86.2

a. Constant is included in the model.

b. The cut value is .500

c. Selected cases validate EQ 1

d. Unselected cases validate NE 1

e. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical variables

with values out of the range of the selected cases.

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Page 57: Logistic Regression Model

Step 0 Constant 1.492 .213 49.042 1 .000 4.444

Variables not in the Equationa

Score df Sig.

Step 0 Variables

Edlev 1.730 1 .188

Loan .253 1 .615

Balance 3.874 1 .049

Collateral .438 1 .508

Interest 2.470 1 .116

Days 5.338 1 .021

Gender 1.269 1 .260

a. Residual Chi-Squares are not computed because of redundancies.

Page 58: Logistic Regression Model

Block 1: Method = Forward Stepwise (Likelihood Ratio)

Omnibus Tests of Model Coefficients

Chi-square df Sig.

Step 1

Step 5.736 1 .017

Block 5.736 1 .017

Model 5.736 1 .017

Step 2

Step 6.148 1 .013

Block 11.884 2 .003

Model 11.884 2 .003

Step 3

Step 4.332 1 .037

Block 16.216 3 .001

Model 16.216 3 .001

Model Summary

Step -2 Log likelihood Cox & Snell R

Square

Nagelkerke R

Square

1 134.478a .038 .062

2 128.330a .078 .126

3 123.998b .104 .170

a. Estimation terminated at iteration number 5 because parameter

estimates changed by less than .001.

b. Estimation terminated at iteration number 6 because parameter

estimates changed by less than .001.

Hosmer and Lemeshow Test

Step Chi-square df Sig.

1 14.629 7 .041

2 8.280 8 .407

3 16.697 8 .033

Contingency Table for Hosmer and Lemeshow Test

Page 59: Logistic Regression Model

previously defaulted = Yes previously defaulted = No Total

Observed Expected Observed Expected

Step 1

1 5 4.483 10 10.517 15

2 3 4.217 12 10.783 15

3 2 3.976 14 12.024 16

4 8 3.500 8 12.500 16

5 2 2.923 14 13.077 16

6 2 2.500 14 13.500 16

7 1 1.938 14 13.062 15

8 0 1.578 15 13.422 15

9 4 1.885 19 21.115 23

Step 2

1 6 5.978 9 9.022 15

2 5 4.557 10 10.443 15

3 5 4.024 10 10.976 15

4 3 3.208 13 12.792 16

5 2 2.670 13 12.330 15

6 1 2.341 15 13.659 16

7 2 1.621 13 13.379 15

8 0 1.291 15 13.709 15

9 3 .934 12 14.066 15

10 0 .375 10 9.625 10

Step 3

1 8 6.521 7 8.479 15

2 7 5.065 9 10.935 16

3 2 3.649 12 10.351 14

4 3 3.375 12 11.625 15

5 0 2.620 15 12.380 15

6 3 2.096 12 12.904 15

7 1 1.651 14 13.349 15

8 0 1.142 15 13.858 15

9 3 .664 12 14.336 15

10 0 .216 12 11.784 12

Classification Tablea

Observed Predicted

Selected Casesb Unselected Cases

c,d

previously defaulted Percentage Correct previously defaulted Percentage

Correct Yes No Yes No

Page 60: Logistic Regression Model

Step 1 previously defaulted

Yes 0 27 .0 0 18 .0

No 0 120 100.0 0 112 100.0

Overall Percentage

81.6

86.2

Step 2 previously defaulted

Yes 0 27 .0 0 18 .0

No 1 119 99.2 3 109 97.3

Overall Percentage

81.0

83.8

Step 3 previously defaulted

Yes 1 26 3.7 1 17 5.6

No 1 119 99.2 3 109 97.3

Overall Percentage

81.6

84.6

a. The cut value is .500

b. Selected cases validate EQ 1

c. Unselected cases validate NE 1

d. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical variables with values out of the

range of the selected cases.

Variables in the Equation

B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)

Lower Upper

Step 1a

Days .002 .001 5.118 1 .024 1.002 1.000 1.003

Constant .822 .337 5.947 1 .015 2.274

Step 2b

Interest -.068 .029 5.346 1 .021 .934 .882 .990

Days .002 .001 7.869 1 .005 1.002 1.001 1.004

Constant 1.815 .579 9.833 1 .002 6.143

Step 3c

Balance .000 .000 4.216 1 .040 1.000 1.000 1.000

Interest -.106 .040 7.147 1 .008 .900 .832 .972

Days .002 .001 4.888 1 .027 1.002 1.000 1.003

Constant 2.966 .901 10.827 1 .001 19.415

a. Variable(s) entered on step 1: Days.

b. Variable(s) entered on step 2: Interest.

c. Variable(s) entered on step 3: Balance.

Model if Term Removed

Variable Model Log

Likelihood

Change in -2 Log

Likelihood

df Sig. of the

Change

Step 1 Days -70.107 5.736 1 .017

Step 2 Interest -67.239 6.148 1 .013

Days -68.803 9.276 1 .002

Step 3 Balance -64.165 4.332 1 .037

Page 61: Logistic Regression Model

Interest -66.917 9.836 1 .002

Days -64.737 5.475 1 .019

Variables not in the Equationa

Score df Sig.

Step 1 Variables

Edlev 1.132 1 .287

Loan .447 1 .504

Balance .668 1 .414

Collateral 1.068 1 .301

Interest 5.649 1 .017

Gender 1.052 1 .305

Step 2 Variables

Edlev .877 1 .349

Loan .332 1 .565

Balance 4.697 1 .030

Collateral .681 1 .409

Gender .956 1 .328

Step 3 Variables

Edlev .434 1 .510

Loan .524 1 .469

Collateral 1.399 1 .237

Gender 1.656 1 .198

a. Residual Chi-Squares are not computed because of redundancies.

Page 62: Logistic Regression Model

Step number: 1

Observed Groups and Predicted Probabilities

16 +

N +

I

N I

I

N I

F I

N I

R 12 +

N N +

E I

N NN I

Q I N

N NN I

U I N

N NN I

E 8 + N N

N N N NN +

N I N N N

N N N NN I

C I NNNN N

N N NN NN I

Y I NNNN NN

NN N NNNNN I

4 + NNNN

NNNNNN NN NNNNNN +

I YNNN

NNNYNYNNNNN NNNNYN I

I

YYYNNYNNYNYNNNNYNNNNNYN I

I

YYYYNYNYYYYYYNNYNNYNNYYNNNNN I

Predicted ---------+---------+---------+---------+---------+---------+---------+-------

--+---------+----------

Prob: 0 .1 .2 .3 .4 .5 .6 .7

.8 .9 1

Group:

YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNN

Predicted Probability is of Membership for No

The Cut Value is .50

Symbols: Y - Yes

N - No

Each Symbol Represents 1 Case.

Step number: 2

Page 63: Logistic Regression Model

Observed Groups and Predicted Probabilities

16 +

+

I

I

I

I

F I

I

R 12 +

N +

E I

N I

Q I N

N I

U I N

N N I

E 8 + N

NN N N +

N I NN

NN NN NN N I

C I NN

NN NN NN N I

Y I NN

NNN NN NNNNNN N I

4 + N NY

N NNN NN NNNNNNN N +

I N N YNNY

NN NNNNNN NNNNNNY NN I

I Y N N NN YNYY

NN YYNNNNNNNNYNNY NN I

I N N Y Y Y YN YYNYY N

YNY NYYNNNNNYNNYNNYNNNNN I

Predicted ---------+---------+---------+---------+---------+---------+---------+-------

--+---------+----------

Prob: 0 .1 .2 .3 .4 .5 .6 .7

.8 .9 1

Group:

YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNN

Predicted Probability is of Membership for No

The Cut Value is .50

Symbols: Y - Yes

N - No

Each Symbol Represents 1 Case.

Page 64: Logistic Regression Model

Step number: 3

Observed Groups and Predicted Probabilities

16 +

+

I

I

I

I

F I

I

R 12 +

+

E I

I

Q I

N I

U I

N N I

E 8 +

N N +

N I N

N N N I

C I NN

N N NN N N N I

Y I NNN

N NN NN NN N N N I

4 + N NNN

NN NNNNN NN N N N +

I N NY N NN NNN

N NN NNNNNN NN NNNN N I

I N N Y YYNNY YNNNNNN

NNNNNNNNNNN NN NNNNNN I

I Y NYN Y YYYYY

YNNYYNYNNYNYNNNNYYYYNNNNYYNYNNNI

Predicted ---------+---------+---------+---------+---------+---------+---------+-------

--+---------+----------

Prob: 0 .1 .2 .3 .4 .5 .6 .7

.8 .9 1

Group:

YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNN

Predicted Probability is of Membership for No

The Cut Value is .50

Symbols: Y - Yes

N - No

Each Symbol Represents 1 Case.

Page 65: Logistic Regression Model

Logistic Regression

[DataSet6] C:\Users\Dr. Faith Adebisi\Documents\Music\Desktop\New folder (3)\DR ADAMU

SPSS. 23.sav

Case Processing Summary

Unweighted Casesa N Percent

Selected Cases

Included in Analysis 147 49.0

Missing Cases 0 .0

Total 147 49.0

Unselected Cases 153 51.0

Total 300 100.0

a. If weight is in effect, see classification table for the total number of

cases.

Dependent Variable Encoding

Original Value Internal Value

Yes 0

No 1

Block 0: Beginning Block

Classification Tablea,b

Observed Predicted

Selected Casesc Unselected Cases

d,e

previously defaulted Percentage

Correct

previously defaulted Percentage

Correct Yes No Yes No

Step 0 previously defaulted

Yes 0 27 .0 0 18 .0

No 0 120 100.0 0 112 100.0

Overall Percentage 81.6 86.2

a. Constant is included in the model.

b. The cut value is .500

c. Selected cases validate EQ 1

d. Unselected cases validate NE 1

Page 66: Logistic Regression Model

e. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical variables with

values out of the range of the selected cases.

Variables in the Equation

B S.E. Wald df Sig. Exp(B)

Step 0 Constant 1.492 .213 49.042 1 .000 4.444

Variables not in the Equationa

Score df Sig.

Step 0 Variables

Edlev 1.730 1 .188

Loan .253 1 .615

Balance 3.874 1 .049

Collateral .438 1 .508

Interest 2.470 1 .116

Days 5.338 1 .021

Gender 1.269 1 .260

a. Residual Chi-Squares are not computed because of redundancies.

Block 1: Method = Enter

Omnibus Tests of Model Coefficients

Chi-square df Sig.

Step 1

Step 24.869 7 .001

Block 24.869 7 .001

Model 24.869 7 .001

Model Summary

Step -2 Log likelihood Cox & Snell R

Square

Nagelkerke R

Square

1 115.345a .156 .253

Page 67: Logistic Regression Model

a. Estimation terminated at iteration number 7 because parameter

estimates changed by less than .001.

Hosmer and Lemeshow Test

Step Chi-square df Sig.

1 9.059 8 .337

Contingency Table for Hosmer and Lemeshow Test

previously defaulted = Yes previously defaulted = No Total

Observed Expected Observed Expected

Step 1

1 9 7.888 6 7.112 15

2 4 5.140 11 9.860 15

3 6 4.135 9 10.865 15

4 0 3.153 15 11.847 15

5 3 2.432 12 12.568 15

6 1 1.683 14 13.317 15

7 1 1.239 14 13.761 15

8 2 .852 13 14.148 15

9 1 .396 14 14.604 15

10 0 .082 12 11.918 12

Classification Tablea

Observed Predicted

Selected Casesb Unselected Cases

c,d

previously defaulted Percentage

Correct

previously defaulted Percentage

Correct Yes No Yes No

Step 1 previously defaulted Yes 5 22 18.5 6 12 33.3

Page 68: Logistic Regression Model

No 1 119 99.2 5 107 95.5

Overall Percentage 84.4 86.9

a. The cut value is .500

b. Selected cases validate EQ 1

c. Unselected cases validate NE 1

d. Some of the unselected cases are not classified due to either missing values in the independent variables or categorical variables with

values out of the range of the selected cases.

Variables in the Equation

B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)

Lower Upper

Step 1a

Edlev -.189 .187 1.017 1 .313 .828 .573 1.195

Loan .000 .000 .051 1 .821 1.000 1.000 1.000

Balance .000 .000 6.861 1 .009 1.000 1.000 1.000

Collateral .000 .000 2.430 1 .119 1.000 1.000 1.000

Interest -.107 .042 6.625 1 .010 .899 .828 .975

Days .002 .001 4.468 1 .035 1.002 1.000 1.003

Gender -.554 .475 1.364 1 .243 .574 .227 1.456

Constant 4.179 1.343 9.688 1 .002 65.328

a. Variable(s) entered on step 1: Edlev, Loan, Balance, Collateral, Interest, Days, Gender.

Page 69: Logistic Regression Model

Step number: 1

Observed Groups and Predicted Probabilities

16 +

+

I

I

I

I

F I

I

R 12 +

+

E I

I

Q I

I

U I

N NI

E 8 +

N N N+

N I

N N N N NI

C I

N NNNN NNNNI

Y I N

N N NNNNN NNNNI

4 + NN N

NN N NNNNN NNNN+

I NN NN

N NN N NN NNNNNN NNNNI

I Y N Y Y N N NYNNNN

NN NN NN NNNNNNNNN NNNNI

I Y Y N Y YNYYN N NY YNN NYNN

YYYYYYNNNNNNYYNYNNNYNNYNYYNYNNNI

Predicted ---------+---------+---------+---------+---------+---------+---------+------

---+---------+----------

Prob: 0 .1 .2 .3 .4 .5 .6 .7

.8 .9 1

Group:

YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

NNNNNNNNNNNNNN

Predicted Probability is of Membership for No

The Cut Value is .50

Symbols: Y - Yes

N - No

Each Symbol Represents 1 Case.