data mining lectures lecture 18: credit scoring padhraic smyth, uc irvine ics 278: data mining...

Data Mining Lectures Lecture 18: Credit Scoring Padhraic Smyth, UC Irvine

ICS 278: Data Mining

Lecture 18: Credit Scoring

Padhraic SmythDepartment of Information and Computer Science

University of California, Irvine


Presentations for Next Week

• Names for each day will be emailed out by tomorrow

• Instructions:– Email me your presentations by 12 noon the day of your

presentation (no later please)– I will load them on my laptop (so no need to bring a machine)– Each presentation will be 6 minutes long + 2 minutes

questions• So probably about 4 to 8 (max) slides per presentation


References on Credit Scoring

Statistical Classification Methods in Consumer Credit Scoring: a Review D. J. Hand and W. E. Henley

Journal of the Royal Statistical Society: Series AVolume 160: Issue 3, November 1997

Available online at class Web page under lecture notes

Also:Credit Scoring and its Applications: L. C. Thomas, D. B. Edelman, J. N.

Crook,SIAM, 2002

Credit Risk Modeling, E. Mays (editor), American Management Association, 1998.


Outline

• Credit Scoring– Problem definition, standard notation

• Data Sources

• Models– Logistic regression, trees, linear regression, etc

• Model building issues– Problem of reject inference

• Practical issues– Cutoff selection, updating models


The Problem of Credit Scoring

• Applicants apply for a bank loan– Population 1 is rejected– Population 2 is accepted

• Population 2a repays their loan -> labeled “good”• Population 2b goes into some form of default -> labeled “bad”

• Model building– Build a model that can discriminate population 2a from

population 2b– Usually treated as a classification problem– Typically want to estimate p(good | features) and rank

individuals this way

• Widely used by banks and credit card companies– Similar problems occur in direct marketing and other “scoring”

applications


Many different applications for Customer Scoring

• Other financial applications:– Delinquent loans: who is most likely to pay up

• Uses historical data on who paid in the past• Often used to create “portfolios” of delinquent debt

– Customer revenue• How much will each customer generate in revenue over the next K years

• Predicting marketing response– Cost of a mailer to a customer is order of $1 dollar– Targeted marketing

• Rank customers in terms of “likelihood to respond”

• “Churn” prediction– Predicting which customers are most likely to switch to another brand– E.g., wireless phone service– Scores used to rank customers and then target most likely with

incentives

• Many more….


Some background

• History– General ideas started in the 1950’s

• e.g., Bill Fair and Eric Isaac -> FairIsaac -> FICO scores

– Initially a bit contraversial• Worries about it being unfair to some segments of society

– US Equal Opportunity Credit Acts, 1975/76• Skepticism that “machine generated rules” from data could

outperform human generated guidelines

– First adopted in credit-card approvals (1960’s)– Later broadly adopted in home-loans, etc– Now widely accepted and used by almost all banks, credit-

granting agencies, etc


Data Sources• Data from the loan application

– Age, address, income, profession, SS#, number of credit cards, savings, etc

– Easy to obtain

• Internal Performance data– How the individual has performed on other loans with the same bank– May only be available for a subset of customers

• External Performance data: – Credit Reports

• How the individual has performed historically on all loans and credit cards• Relatively expensive to obtain (e.g., $1 per individual)

– Court Judgements– Real Estate records

• Macro-level external data– Demographic characteristics for applicant’s zip code or census tract


Loan Application Data

• Issues– Data entry errors (e.g., birthday = date of loan application)– Deliberate falsifications (e.g., over-reporting of income)

– Legal issues• US Equal Credit Opportunity Acts, 1975/76 • Illegal to use race, color, religion, national origin, sex, marital

status, or age in the decision to grant credit• But what if other variables are highly predictive of some of these

variables?


Variable Name Description Codings

dob Year of birth If unknown the year will be 99

nkid Number of children number

dep Number of other dependents number

phon Is there a home phone 1=yes, 0 = no

sinc Spouse's income

aes Applicant's employment status V = Government

W = housewife

M = military

P = private sector

B = public sector

R = retired

E = self employed

T = student

U = unemployed

N = others

Z = no response


Variable Name Description Codings

dainc Applicant's incomeres Residential status O = Owner

F = tenant furnished

U = Tenant Unfurnished

P = With parents

N = Other

Z = No responsedhval Value of Home 0 = no response or not owner

000001 = zero value

blank = no responsedmort Mortgage balance outstanding 0 = no response or not owner

000001 = zero balance

blank = no response

doutm Outgoings on mortgage or rent

doutl Outgoings on Loans

douthp Outgoings on Hire Purchase

doutcc Outgoings on credit cardsBad Good/bad indicator 1 = Bad

0 = Good


Credit Report Data

• Available from 3 major bureaus in the US:– Experian, Trans-Union, and Equifax

• Data in the form of a list of transactions/events– Typically needs to be converted into feature-value form

• E.g., “number of credit cards opened in past 12 months”

– Can result in a huge number of features

• Cost varies as a function of type and time-window of data requested– Interesting problem: “cost-optimal” downloading of selected

credit report features adapted to each individual as a function of cheaper features


Defining Good and Bad

• Good versus Bad– Not necessarily clear how to define 2 classes– E.g.,

• bad = ever 3 or more payments in arrears?• Bad = 2 or more payments in arrears more than once?

– A “spectrum” of behavior• Never any problems in payments• Occasional problems• Persistent problems

– Typical to discard the intermediate cases and also those with insufficient experience to reliably classify them

• Not ideal theoretically, but convenient


Selecting a Data Set for Model Building

• Sample selection– Typical sample sizes ~ 10k to 100k per class– Should be representative of customers who will apply in the future– Need to be able to get the relevant variables for this set of

customers• Internal performance data• External performance data• Etc

• External data sources (e.g., credit reports) can result in a very large number of possible variables– E.g., in the 1000’s– E.g., “number of accounts opened in past 12/24/36/… months”– Typically some form of variable selection done before building a

model• Often based on univariate criteria such as information gain


Models used in Credit Scoring

• Regression:– Ignore the fact that we are estimating a probability– Typically linear regression is used

• Classification (more common approach)– Logistic regression (most widely used)– Decision trees (becoming more popular)– Neural networks (experimented with, but not used in practice so much)– Nearest neighbors– Model combining - some work in this area– SVMs - too new, relatively unproven

• General comments– Many trade-secrets, companies like FairIsaac do not publish details– Generally the industry is conservative: prefer well-established methods– Classification accuracy is only one part of the overall solution….


g-1( ) w0 + w1x1 +…+ wpxp=p

Logistic Regression Models

Training Data

log(odds)

( )p

1 - p log

logit(p)

0.0

1.0

p 0.5

logit(p )

0

Note that near 0,logit(p) is almost linear,so linear and logistic regressionwill be similar in this region

w0 + w1x1


Modeling Example

Model Bad Risk Rate (%)

k nearest neighbor with special metric

43.09

k nearest neighbor (standard)

43.25

logistic regression 43.30

linear regression 43.36

decision tree 43.77

(from Hand and Henley paper)


Evaluation Methods

• Decile/Centile reporting:– Rank customers by predicted scores– Report “lift” rate in each decile (and cumulatively) compared to

accepting everyone

• Receiver Operation Characteristics– Vary classification threshold– Plot proportion of good risks accepted vs. bad risks accepted

• Bad Risk rate = bad risk among those accepted– Let p = proportion of good risks– Let a = proportion accepted

e.g., can show that, with a > p, the bad risk rate among those accepted is lower bounded by 1 – p/a

e.g., p = 0.45, a =0.70 => bad risk rate must be between 0.35 and 0.78


Economics of Credit Scoring

• Classification accuracy is not the appropriate metric

• Benefit = Increase in revenue from using model - cost of developing and installing model

• Model development: anywhere from $5k to $100k depending on the complexity of modeling project

• Model installation: can be expensive (software, testing, legal requirements)– model maintenance and updating should also probably be included

• Revenue increase based on estimate performance plus assumptions about cost of bad risks versus good risks

• Small improvements in accuracy (e.g., 1 to 5%) could lead to significant gains if the model is used on large numbers of customers


Problem of Reject Inference

• Typically the population available for training consists only of past applicants who were accepted– Application data is available for “rejects”, but no performance

data

• Question:– Is there a way to use the data from rejected applicants?

• Answer: no widely accepted approach. Methods include– Define all rejects as “bad” (not reliable!)– Build a statistical model (treat labels as missing, but biased)

• Cam be quite complex, see Section 5 in Hand and Henley paper– Grant credit to some fraction of rejects and track their

performance so that the “full population” is sampled• Rarely used for loans, but ideally is the best method


Other issues

• Threshold selection– Above what threshold should loans be granted– Depends on goals of the project

• E.g., focusing on a small set of high-scoring customers versus “widening the net” to include a larger number (but still minimizing risk)

• Time-dependent classification– What really matters is what the customer will do at time t+T– Can we model the “state” of a customer (rather than statically)?

• Still somewhat of a research topic

• Overrides– Loans are still manually “signed-off”. The bank may sometimes

override the system’s recommendation


The model works… now what?

• Implementation– Depends on whether the model is replacing an existing automated

model– … or is the first time modeling is being applied to the problem– Many software issues in terms of databases, security, etc

• Monitoring and tracking– Important to see how the scorecard works in practice– Generating monthly/quarterly reports on scorecard performance

• (naturally there will be some delay in this)– Analyzing in detail at performance on segments, by attribute, etc

• Time for a new model?– E.g., population has changed significantly– E.g., new (cheap and useful) data available– E.g., new modeling technology available

data mining lectures lecture 18: credit scoring padhraic smyth, uc irvine ics 278: data mining...

Documents

consumer credit scoring

data mining lectures

credit reports

uc irvine data sources

historical data

scoring applications

irvine slide

number of credit cards