2 subprime consumer lending c - delphianalytics.com · 2 subprime consumer lending. p ......

43
C raig M. Allen, Ph.D., is the founder of Delphi Structured Finance Corp., based in the United States, and Delphi Compagnie Financière based in Geneva, Switzer- land, providing asset-securitization ser- vices on both continents. Allen has helped to establish a new market in callable mortgage bond products in Denmark and (together with his partner, a Danish bank) has built a securitization team that has issued approximately $3 billion of actively trad- ed new securities. Delphi has also devel- oped a trade-finance product and, together with a German bank, has creat- ed a finance company with a portfolio of commercial performance securi- ties exceeding $300 million. Prior to establishing Delphi, Allen was a founding partner of Aegis Holdings Corp. and president of Aegis Financial Advisors, Inc. He was responsible at Aegis for developing unusual niche securities. Earlier in his career, Allen was with U.S. investment bank Bear Stearns & Co., where he developed the credit-enhancement capacity that allowed the firm to create rated securities from whole-loan (non-U.S. agency) mortgage collateral, and played an integral role in the development of the French asset-backed market. Prior to Bear Stearns, Allen was involved in multifamily real estate, owned a small mortgage company in Dallas, Texas, and ran a small consulting firm. He taught courses in multivariate and nonpara- metric statistics as well as other decision-science courses at the University of Texas at Dallas. Allen holds a bachelor’s degree in psychology from Brigham Young University. He received a master of science degree in animal behavior, a master of arts degree in mathematics,and his doctorate in decision-making under uncertainty from Arizona State University (a special program involving the psychology, mathematics, and engineer- ing departments). Special thanks are due to Spencer Kimball for assistance with the section on “Loan Grouping and Neighborhood Membership Techniques.” Thanks are also due Dawn Howey and Linda for help in producing this chapter. 2 Subprime Consumer Lending

Upload: others

Post on 23-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

Craig M. Allen, Ph.D., is thefounder of Delphi StructuredFinance Corp., based in the

United States, and Delphi CompagnieFinancière based in Geneva, Switzer-land, providing asset-securitization ser-vices on both continents.

Allen has helped to establish anew market in callable mortgage bondproducts in Denmark and (together withhis partner, a Danish bank) has built asecuritization team that has issuedapproximately $3 billion of actively trad-ed new securities. Delphi has also devel-oped a trade-finance product and,together with a German bank, has creat-

ed a finance company with a portfolio of commercial performance securi-ties exceeding $300 million.

Prior to establishing Delphi, Allen was a founding partner ofAegis Holdings Corp. and president of Aegis Financial Advisors, Inc.He was responsible at Aegis for developing unusual niche securities.

Earlier in his career, Allen was with U.S. investment bank BearStearns & Co., where he developed the credit-enhancement capacitythat allowed the firm to create rated securities from whole-loan(non-U.S. agency) mortgage collateral, and played an integral role in thedevelopment of the French asset-backed market.

Prior to Bear Stearns, Allen was involved in multifamily realestate, owned a small mortgage company in Dallas, Texas, and ran asmall consulting firm. He taught courses in multivariate and nonpara-metric statistics as well as other decision-science courses at theUniversity of Texas at Dallas.

Allen holds a bachelor’s degree in psychology from BrighamYoung University. He received a master of science degree in animalbehavior, a master of arts degree in mathematics,and his doctorate indecision-making under uncertainty from Arizona State University (aspecial program involving the psychology, mathematics, and engineer-ing departments).

Special thanks are due to Spencer Kimball for assistance withthe section on “Loan Grouping and Neighborhood MembershipTechniques.” Thanks are also due Dawn Howey and Linda for help inproducing this chapter.

2 Subprime Consumer Lending

Page 2: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

Predictive credit modeling, it seems, has been a bit like El Dorado,the fabled city of gold sought by the Spanish conquistadores:ephemeral, always just over the horizon, yet promising wealth

untold. Fortunately, however, the elusive goal — quantitative assess-ment of creditworthiness — is coming ever nearer to realization.

Credit scoring is not an easy problem. The goal is somehow tomeasure a potential borrower and, from that initial assessment, makepredictions about the borrower’s expected behavior. The underlyinghypotheses of any credit-scoring methodology are that similar borrowerstend to behave similarly and that historical performance assists in pre-dicting future performance. The difficulty, however as will be shown inthis chapter, consists first in framing the problem and then determiningwhich borrowers are, for these purposes, similar to one another.

This chapter will show the early development of credit assess-ment as a traditional credit-scoring problem, then expand upon that viewto illustrate the techniques that are being utilized to explore more diffi-cult credit problems, such as those associated with subprime lending. Thetraditional approach to credit scoring will be illustrated mainly from aconceptual viewpoint (that is, visually via exhibits); the more formalmathematics appear in a later section, “Mathematical Aspects: TheLinear Discriminant Function.” More modern techniques for creditassessment will examined, again with a conceptual development and onlysummarizing principles spelled out in the text, leaving the more formaldiscussion for the sections on “Mathematical Aspects: The Computationof Principal Components” and “Mathematical Aspects: NearestNeighbors Clustering.”

3

CChhaapptteerr AA

CCrreeddiitt SSccoorriinngg aanndd RRiisskk--AAddjjuusstteedd PPrriicciinngg::AA RReevviieeww ooff TTeecchhnniiqquueess

CCrraaiigg MM.. AAlllleenn,, PPhh..DD..Chairman, Delphi Structured Finance Corp.

Président/Directeur Général, Delphi Compagnie Financière

Page 3: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

MMEEAASSUURREEMMEENNTTThis task of credit assessment is, at the outset, a problem of measure-ment. And, like all measurement challenges, the first two issues to con-sider, those that form a foundation for this craft, are reliability and valid-ity.

RReelliiaabbiilliittyyFor measurements to be reliable, they must be repeatable. Not only mustthe data contributing to the measure be objective (in the sense that mostobservers would ascribe a similar value) but the measurement process(that is, the calculation steps) must produce a consistent result. A credit-scoring process that repeatedly measures identical (or essentially simi-lar) borrowers, should produce identical (or essentially similar) results.Further, presuming that the attribute of creditworthiness is somethingrelatively stable (allowing us to predict behavior two or three years intothe future), such a measure for the same individual should not vacillatewildly from month to month. This sounds easier to achieve than is thecase in actual practice, and bears some discussion at the outset.

An assessment of similarity among borrowers (that is, a goodcredit score) should be based upon directly observable variables, such asthe number of months at the current residence, months at the currentemployment, number of dependents, debt payment history, monthlyincome (although here it starts to get subjective), current levels of debt,and other obligations. These directly measurable variables are the begin-ning of a reliable credit-scoring process. But, as one might guess, evenmany of those seemingly observable measures require a good deal of sub-jective input. Usually, in implementation, the actual borrowers’ real-lifesituations give rise to more complex interpretations: Should irregularincome be fully counted? What excuses exonerate previous delinquentpayments?

These objectively based variables comprise only the startingpoint for a predictive credit measure. Only if consistency can prevail inthe subjective determinations of the input variables could one hope for areliable credit-scoring mechanism. Length, for example, is an inherentlyobjective variable; but if the ruler by which it is measured is elastic andcan be stretched by the observer, then the measured variable is not reli-able even though the underlying concept of length suggests it should beso. This subjective capacity is a factor over which the scoring-modeldevelopers have no control. The adage, “garbage in, garbage out,” isapplicable, and poor management control of subjective decision-makers(or poor training) has frustrated many a scoring system.

4 Subprime Consumer Lending

Page 4: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

The input variables are only a part of a reliable measurementsystem. The rest of the process, the calculation part, must also contributeto reliable measure. Even when the ruler is inelastic, if our recording oflength is done with inconsistent precision then differences betweenlengths will be unreliable. That would be the case if measurements weresometimes to the nearest centimeter, other times to the nearest thou-sandth of a centimeter and other times to the nearest inch). For anotherexample, the area of a rectangle might seem to be an inherently reliableattribute: If both length and width are meticulously measured, and areais calculated as the product of these two, one would expect a reliableresulting measure. Suppose, however, that the calculator used to com-pare the areas of two rectangles operates to only one decimal place ofprecision; the resulting calculations may not reliably recognize differ-ences in area.

In like manner, other factors can degrade the reliability of a cred-it measure. Too heavy a weighting of short-term variables may causelarge changes in assessments of creditworthiness from period to period,when none, in fact, may be justified. For example, the salesman, paidlargely by commission, may have income some months that is dispropor-tionately large or small. Such short-term fluctuations should be weightedappropriately with longer-term trends in order to obtain a relatively sta-ble credit assessment.

A reliable measure, then, is based upon objective criteria, or,when subjectivity is required, with consistent application of the subjec-tive principles, with appropriate intermediate calculations, and upon anappropriate mix of both long- and short-term variables.

VVaalliiddiittyyThe other fundamental issue to consider when dealing with a measure-ment problem is one of validity: Does the measure really tell us what wethink it is telling us? History books (and Monty Python movies) are fullof examples of reliable measures that simply are not valid. Intelligence(whatever that is) was once believed to be predicted by measuring thesize of the cranium. A witch was thought to be detected by the presenceof warts or by tests of floating on water (or, in parody, by comparingweight to that of a duck). Regardless of how reliable or repeatable thesemeasurements are, a fundamental question to ask of credit-assessmenttechniques is one of their validity.

The fundamental hypothesis of predictive credit models is thatsimilar borrowers tend to behave similarly. Ergo, valid methods of aggre-gating applicants reliably into groups that were somehow “similar” with

5Chapter A: Credit Scoring and Risk-Adjusted Pricing

Page 5: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

respect to credit performance had to be developed. The first and mostnatural method of doing this is to group borrowers into categories of“good” and “bad.” Surely the good borrowers repaid their loans and thebad borrowers did not. However, as is the case with detecting a witch,detecting a good borrower is a daunting task; often the testing processhas lasting consequences. Just as non-witches were defined post hoc asthose who float when tied up and thrown into a pond (with permanentconsequences to the one tested), bad borrowers are often only definedpost hoc as those who default on their debt (with permanent conse-quences to the lender).

Early credit scoring, in general, attempted to classify borrowersprospectively into groups of good and bad payers and not simply do soretrospectively; the classification needed to reliably sort borrowers intogroups that were valid predictors of future event’s probability. Thus goodand bad borrowers became predictive labels, suggesting something aboutpropensity to repay debt.

‘‘GGoooodd BBoorrrroowweerr’’ DDeetteeccttiioonnAlthough the credit-scoring concept has moved much beyond the originalnotion of distinguishing good from bad credits retrospectively, it is usefulto examine the original purposes that recognized the need for such ascore. Along with the natural development of this model, certain assump-tions will creep into our thinking. After reviewing the development of thetraditional credit model, we will examine these assumptions and discusswhy a conceptual shift has been necessary to expand our thinking of cred-it scoring to apply to the subprime markets.

It is easy to picture the desk-thumping bank president demand-ing that his loan officers make only good loans to good customers. Creditcommittees, to this day, tend to reach a conclusion that a customer towhom a loan is granted is a good credit. But the underlying idea that isbeing expressed is really one of relative probability.

Strangely, it was the early development of radio that gave us theroots of modern credit scoring. In those early days of poor reception dom-inated by static, it was important to develop a theoretic framework fordetecting signals amid the noise. This theory of signal detection validatesitself by measuring such things as hits, misses, false alarms and correctrejections. Moving this to our credit context, we create a matrix as shownin Exhibit 1

We can use this matrix to monitor our decision-making effective-ness. Obviously it is the hits (correct detection of a good borrower) andcorrect rejections (of a bad borrower) that we are attempting to maxi-

6 Subprime Consumer Lending

Page 6: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

mize. And we want to minimize the misses (a good borrower that we mis-takenly categorize as bad) and the false alarms (a bad borrower that wemistakenly categorize as good). As each application is examined, theunderwriter attempts to determine whether this is a good borrower (thesignal) or just another bad one (the noise).

Different business objectives create biases in decision-making. Ifwe are an originator, paid by volume of loans we approve, we may bebiased to grant a disproportionate number of loans (increasing too great-ly the false alarms or defaults). Similarly, if it is our money that is lostwhen a loan defaults, we would be biased to reject a disproportionatenumber of loans (increasing the misses or lost lending opportunities). Themathematical developments of the 1950s allowed objective measurementof these biases, but they will not be addressed in this work.

Also within this framework of signal-detection theory, a varietyof broader statistical tools began to be developed for discriminatingbetween signal and noise populations. These coincided with the improve-ment of radio receivers which were able to tune in more precisely on indi-vidual frequencies. These same tools proved to have useful application incredit decision-making and provided the fundamental tools for develop-ment of the traditional credit-scoring approach. This approach would leadto an initial validation framework for credit-scoring methods.

PPrraaccttiiccaall IImmpplleemmeennttaattiioonnOne of the early objectives for the credit score was practicality. Theempirically derived credit score had to be made available at the point ofcredit decision-making. Since the early credit-scoring technology was

7Chapter A: Credit Scoring and Risk-Adjusted Pricing

Exhibit 1: ‘Good Borrower’ Detection Matrix

Borrower Categorizationat Time of Application

GGoooodd BBaadd(grant loan) (reject loan)

GGooooddActual (won’t default) Hit MissBorrowerCategory BBaadd False Correct

(will default) Alarm Rejection

Page 7: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

first created in the 1950s, prior to the advent of affordable and readilyavailable computers, the real breakthrough was in creating an easilycomputed score that could be practically implemented.

In that era of “prime” lending only, the task was one of recogniz-ing those good borrowers who would have the lowest incidence of default.A simple scorecard method was introduced that assigned various pointvalues to variables that were collected on the loan applications. Thesepositive and negative point values resulted from simple operations suchas multiplication of an objective borrower variable by one factor, anothervariable by another factor, or lookups of correct point values. The creditscore was the result of summing the total number of points. The timerequired to calculate the score was just a moment or two, and therequired mathematical ability was well within the level of any profes-sional loan officer or underwriter.

This easy-to-use scoring method put a powerful, statisticallyderived tool in the hands of credit decision-makers. An objective andquantitative method was put into service to help the decision-makers jus-tify their credit decisions. The method was reliable and, with the collec-tion of hit and false alarm rates at different credit-score values, it tookthe first steps toward validation. Default rates for those borrowers withhigh credit scores were demonstrably lower than for those of borrowerswith low scores.

CChhaannggiinngg EEnnvviirroonnmmeenntt PPllaacceess DDiiffffeerreenntt DDeemmaannddssOOnn CCrreeddiitt SSccoorreessSince the days of those early credit scorecards, many aspects havechanged. The market no longer seeks just those borrowers with the bestcredit. The requirement has evolved to predictability of cash flow, not justlow default rate. Several credit businesses today, in fact, cater only to poor-er credits — those that the original credit scorecards would have rejected.

The market for credit is built upon predictable flows of cash.While the payment flow of nondefaulting borrowers is routine, also pre-dictable are the aggregate flows of large samples of borrowers who areless creditworthy (or so have several credit companies posited).Consequently, to the extent that a predictable set of flows exceeds theyield requirement established, credit companies have sprung up to cap-ture those flows as loans. Certainly, the loans to lower-credit borrowershave characteristics (such as higher coupons) that compensate for pay-ments missed because of defaults. But predictability of cash flow is thedriver of modern credit decisions, not simply the predicted absence ofdefaults.

8 Subprime Consumer Lending

Page 8: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

Another development that has changed the demands on a credit-scoring process is the relatively low cost and easy accessibility of com-puters. The virtual ubiquity of the personal computer has trivialized thecalculation requirement that was important to the success of the score-card approach. Very complex and difficult calculations can be easily andpractically performed at the point of credit decision-making by loweringthe requirement from “human calculator” to that of data entrant.

These developments have, together, given birth to a new gener-ation of scoring technology that provides not just a measure of borrowerquality but an actual prediction of each loan’s expected performance.Such technology, which directly estimates a potential borrower’s perfor-mance, is often described as predictive modeling or risk-adjusted pricingrather than simply a credit-scoring system.

Through the development of this second generation of credit mod-eling tools we are able to separate ourselves more fully from the assump-tions that initially constrained us. But let us begin with a review of the tra-ditional approach. This will be followed by a critique of its assumptionsand, finally, with the introduction of the second-generation tools.

AA CCOONNCCEEPPTTUUAALL RREEVVIIEEWW OOFF TTHHEE TTRRAADDIITTIIOONNAALLCCRREEDDIITT--SSCCOORRIINNGG AAPPPPRROOAACCHH AANNDD MMEETTHHOODDOOLLOOGGYYAs discussed earlier, the underlying challenge in credit-scoring is to mea-sure reliably some set of features of each member in a population of bor-rowers that results in a meaningful prediction of the borrowers’ creditperformance. The measurement process converts the assessment taskinto a mathematical problem that is easily and visually understandable.The concepts, but not the mathematics associated with this process, aredeveloped in the following sections. The mathematics, as mentioned ear-lier, are addressed in “Mathematical Aspects: The Linear DiscriminantFunction.”

QQuuaannttiittaattiivvee MMaappppiinnggReliable measurement begins with objective, quantitative mapping of thecharacteristics of borrowers. Specific rules precisely determine how eachinstance of a variable is transformed to a number. Age, for example, canbe mapped as months since birth at the time of loan application. Income,as another example, may be subdivided into the variables of reportedincome and verified income, each with specific rules determining howvarious types of monetary inflows are to be counted.

Certain variables may be qualitative in nature, such as theanswers to: Have there been previously reported bankruptcies? Such

9Chapter A: Credit Scoring and Risk-Adjusted Pricing

Page 9: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

yes/no variables are quantified by substituting a value of 1 for one occur-rence, 0 for the other. Other qualitative variables that feature a list ofalternatives are best thought of as a sequence of yes/no variables witheach yes/no alternative represented by either a 1 or 0 value in the list.

Upon completing the quantitative mapping, each potential cus-tomer is represented as a series of numbers; as a row of values, with eachcolumn representing a different measure of that customer. For statisticalpurposes more fully explained in the concluding sections on mathematics,it is useful to think of these measurements in terms of rows of customersand columns of variables.

A typical list of variables might begin with the examples shownin Exhibit 2.

While the variables in this list are provided only as an example,some appear very commonly in actual credit models. Added to this listmight be co-borrower variables, if any, or measures taken from a creditreport, if such information is obtainable. It is important to point out thatcertain legal restrictions may apply to the type of information that maybe used in developing a credit score (race and gender, for example, maynot be used in making credit decisions in the United States). Also — inthe U.S.A., at least — age may not generally be used in making creditdecisions. But if such age measurement is part of a quantitatively derivedcredit model, there may be instances when it can be used provided that itdoes not contribute to discrimination against the elderly. Each jurisdic-

10 Subprime Consumer Lending

Exhibit 2: Typical Abbreviated List of Borrower Variables

Age of borrower in monthsNumber of dependents

Months at current residenceMonths at current employment

Length of time this line of employmentOwns residence (y/n)Works full-time (y/n)

Mailing address same as residence (y/n)Cash down-payment amount

Gross monthly incomeGross other income

Verified gross monthly incomeVerified debt ratio

Page 10: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

tion may have its own laws associated with information use, and shouldbe verified before implementing a credit model.

Once a suitable list of variables is selected (which list may con-tain, in some cases, as many as 200 or more different measures), themagic of mathematics takes over.

AA SSppaattiiaall RReepprreesseennttaattiioonn ooff tthhee PPrroobblleemmEach customer, as measured by the several variables, can be thought ofas representing a specific point in the many-dimensional variable spacedepicted in Exhibit 3.

Exhibit 3 illustrates several customer data points when plottedin two variables. As the number of variables increases, the dimensionali-ty of the space in which the customers are represented increases. Whenrepresented in this multidimensional variable space, customers whoscore similarly on the several variables tend to be near to each other.

For example, the two loans near to each other in the bottom leftof the graph would be considered similar in these two dimensions.Similarity of two borrowers is merely a reference to the similarity of thevalues of their variables (or their proximity to each other in the variablespace). And, since the assumption that underlies all credit scoring is that

11Chapter A: Credit Scoring and Risk-Adjusted Pricing

10 20 30 40 50 60 70

Mos of Residence

170018001900200021002200230024002500260027002800

Inco

me

Exhibit 3: Plot of Borrower CharacteristicsIn the Space of Two Arbitrarily Selected Variables

Months of Residence

Page 11: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

similar borrowers tend to behave similarly, the proximity of the two bor-rowers (in the lower corner of Exhibit 3) would lead to a prediction thatthey would have similar credit performance characteristics.

DDrriivveenn bbyy ‘‘GGoooodd’’ vveerrssuuss ‘‘BBaadd’’ DDiissttiinnccttiioonnThe development of a traditional credit scoring model is easy to under-stand when approached from a graphical perspective. The basicapproach, if you will recall, attempts to discriminate good (nondefaulting)loans from bad (defaulting) loans. Two samples are taken, one of borrow-ers that have defaulted and the other of borrowers that did not defaultbut paid their loan off at maturity. Although, as discussed earlier, thedefaulting and the nondefaulting borrowers usually are mixed togetherand overlap, a large sample of defaulted loans is likely to have a concen-tration of data points in one or more regions, and a sampling of the non-defaulting loans is likely to have a concentration of data points in otherareas.

Traditional credit-scoring models usually presume that all non-defaulting borrowers constitute one population with a single mean, and,similarly, that all defaulting borrowers also constitute one populationwith a single mean. While these assumptions are not generally true in anabsolute sense — particularly not when dealing with subprime popula-tions — for the sake of this model development the assumption is made.It is often interesting to observe just how robust these assumptions turnout to be, however, when used to detect prime borrowers, as was theiroriginal use.

In general, the main principal underlying development of the tra-ditional credit score is that nondefaulting borrowers and defaulting bor-rowers, although hopelessly mixed together, are thought to represent dif-ferent populations of individuals. These populations are considered tohave different characteristics or means, but the range of values associat-ed with each variable or variance is thought to make it very difficult todiscriminate one from the other — except in retrospect!

TThhee LLiinneeaarr DDiissccrriimmiinnaanntt FFuunnccttiioonnA sample of good and bad loans is identified to begin the development ofa linear model for discriminating between the two supposed populations.The linear model becomes the scorecard that produces the specific creditscore for an individual. Although the statistics are presented more for-mally in the later sections on mathematical aspects, a more generalappreciation for the process will be discussed in this section.

As described earlier, the two populations of borrowers (those

12 Subprime Consumer Lending

Page 12: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

that default and those that do not default) are considered to be different(that is, the means of the distributions are different). But because thevariance found in each variable measure is generally large, the actualpopulations overlap considerably. The task is to distinguish between thetwo. In general, this would be considered a classification problem; that is,one is interested in making an observation of a borrower and then cor-rectly classifying that observation as belonging to that which defaults orthat which does not default.

In building the traditional scoring model, the variance is consid-ered to be “normal” (that is, the typical bell-shaped curve studied in ourstatistics classes applies). Exhibit 4 illustrates the concept of two differ-ent populations having different means, but with enough variance tocause the two populations to blend. It is the overlap of the two popula-tions — nondefaulting and defaulting — that makes the classification taskdifficult.

In Exhibit 4, which deals with just one variable (or one dimen-sion, as we are wont to call it), a solution to the dilemma of distinguishingbetween the two populations can be readily seen. It would be possible tocreate a decision rule, or criterion point, that divides the two populationsas meaningfully as possible. Although a debate could rage about exactlywhere to place that cutoff point, it would generally be between the peaksof the two distributions.

13Chapter A: Credit Scoring and Risk-Adjusted Pricing

Exhibit 4: Two Borrower Populations

Variable Value

0%

1%

2%

3%

4%

5%

6%

Rel

ativ

e F

requ

ency

Non-Defaulting

Defaulting

Page 13: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

One typical solution would be to place the dividing criterion forthe two populations at the point where the two distributions seem to diveinto one another, as is the case in Exhibit 4. Borrowers scoring to theright of this point would be classified as nondefaulting and those scoringto the left of this point as defaulting. By placing the criterion at this inter-section, as it is drawn, one maximizes the probability of correctly classi-fying an observation into its appropriate population. The intersection ofthe two probability functions implies that the two alternatives are equal-ly probable.

However, in credit decisions, the cost of making a bad decisionmight create a bias away from simply wanting to be most frequently cor-rect in the classification process. (With some thought, it is apparent thatfor borrowers falling right at this point of intersection, half of the loansare expected to default.) The cost of defaulting loans may bias one’s opin-ion away from accepting a 50% default rate (at the point of intersection),and create a criterion point that has a somewhat smaller percentage ofobservations that are expected to default. The criterion for smallerdefault rates would be to the right of that point of intersection. Thiswould, of course, mean that some good borrowers are rejected, but thisbias would ensure that at the criterion point more than half of the bor-rowers will not default. Exhibit 5 illustrates this concept in two dimen-sions, and allows us to see one other issue that should be considered.

Exhibit 5 represents a population of nondefaulting borrowersand a population of defaulting borrowers as concentric ellipses, whichindicate curved distributions much as a topographical map would indicatethe shapes of two hillocks. The two populations overlap, or are mixedtogether as in Exhibit 4, represented by the intersecting curves describ-ing their shapes.

In seeking to discriminate between these two populations, onemust consider not only the locations of the means of the population but thevariance and covariance as well. The two populations in this figure, default-ing and nondefaulting borrowers, have means that are connected by thedashed line AA. One might view line AA as simply a way to rotate the two-dimensional case into another one-dimensional case identical to that illus-trated in Exhibit 4 and select a criterion point as described earlier.

However, because of the multidimensional nature of the two pop-ulations and their shape (that is, the covariance of the variables mea-sured), selecting a criterion point is not enough. One must also select theangle, or tilt, of the criterion rule (in a two-dimensional problem the cri-terion is a line, in three dimensions, a plane, and so forth). The dotted lineBB is perpendicular to (or “orthogonal to” as a mathematician might say)the line AA connecting the means. This line BB, however, obviously would

14 Subprime Consumer Lending

Page 14: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

not be a good criterion line because it is evident that in the lower part ofthe figure it would include a disproportionate part of the defaulting pop-ulation relative to the part of the population included in the upper part.The solid line CC is rotated to take into account the covariance of the pop-ulations and maintain a balanced division between the two.

In general, for discrimination tasks of multiple dimension, a sur-face is selected that has the position and rotation required to divide thepopulations appropriately, as shown in Exhibit 6.

The positioning of this surface is adjusted so as to include only anacceptable probability of incorrectly classifying a defaulting borrower asnondefaulting. Exhibit 6 presents a series of such lines, CC11, CC22, etc., thatwould be increasingly conservative about the inclusion of defaulting bor-rowers among those deemed nondefaulting. Each of this series of criteri-on surfaces is parallel to the others and, for a credit-scoring system, thedirection of increasingly conservative decisions is arbitrarily chosen to bepositive (in the direction of the nondefaulting borrowers).

As may now be apparent, the direction ��, which is simply a linepointing generally in the direction of the good borrowers in the multidi-mensional borrower space, is really nothing other than a credit-scoringfunction. This process of determining the surfaces CC11, CC22, etc. and thefunction described by the line is called, in statistical terms, the develop-ment of a linear discriminant function between the two populations.

15Chapter A: Credit Scoring and Risk-Adjusted Pricing

Exhibit 5: Two Multivariate Populations

Variable 1

A

BC

Page 15: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

CCaalliibbrraattiioonn ooff FFuunnccttiioonn ffoorr EExxppeecctteedd DDeeffaauulltt RRaatteessOnce the linear discriminant function is determined, a fully functionalcredit-scoring model is obtained by adjusting the zero point and the scaleof the function so that, when calculating the value resulting for each bor-rower’s variable values, a value consistent with credit-scoring conven-tions is obtained. Values, for example, are usually expected to be between0 and 1,000, with certain score values corresponding to a certain level ofexpected defaults.

Some model developers would seek consistency with other mod-els that they have developed and would attempt to calibrate the values oftheir various discriminant functions so that a score of 620, for example,would more or less predict approximately the same percentage of defaultsacross all of their models. This is often called the calibration process forthe model; it attempts to make historical comparisons of credit scoresroughly comparable from population to population or from year to year.

DDiiffffiiccuullttiieess aanndd LLiimmiittaattiioonnssAAssssoocciiaatteedd wwiitthh TTrraaddiittiioonnaall CCrreeddiitt--SSccoorriinngg SSyysstteemmssAlthough the traditional credit-scoring model has been shown to be quiteeffective in separating the extremely good credits from the extremelybad, there are a few difficulties and limitations associated with these sys-tems. The first to be addressed deals with the conceptualization of theproblem and the theoretic framework itself.

The most serious problem, as alluded to earlier, is the basicassumption that all defaulting borrowers come from a single populationwith a common mean and that all nondefaulting borrowers come fromanother single population with a common mean. This would imply thatthe borders between the defaulting and nondefaulting populations aresmooth across many variables. As is particularly the case with subprimepopulations, these borders are not smooth — and significant interactionsare very often found among variables — making some combinations muchmore susceptible to higher default rates than traditional credit-scoringmodels would predict.

Each of the significant problems associated with the traditionalcredit-scoring process is addressed in this section. Some of these prob-lems are shown to be simply tradeoffs made for computational simplicityand really do not cause serious difficulties — especially when used withprime borrower populations. Other problems, however, are more signifi-cant and contribute to the need for the new generation of credit-model-ing techniques.

16 Subprime Consumer Lending

Page 16: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

SSiinnggllee PPooppuullaattiioonnss wwiitthh SSmmooootthh BBoorrddeerrssExperience with subprime populations of borrowers reveals that theshapes of borrower populations that default and those that do not defaultare quite complex. This is illustrated in Exhibit 7.

In Exhibit 7, each symbol represents a group of approximately300 actual customers, plotted along two arbitrary variables associatedwith information on their loan applications. The asterisks, primarily inthe lower left of the figure, represent groups of customers who weredenied credit by underwriters evaluating their applications. The crossesand the circles represent groups of customers whose applications wereapproved and who were ultimately granted credit for an automobile loan.

The symbols for those groups accepted for credit are furtherdivided into two groups, as follows: At the lower left of the main clusteris a relatively flat clustering of circles representing groups of borrowerswho demonstrated high default rates. The remaining cross symbols rep-resent borrowers with relatively low default rates. It is striking that boththe groups of borrowers deemed worthy of credit and, then again, those

17Chapter A: Credit Scoring and Risk-Adjusted Pricing

Exhibit 7: Groups of Borrowers Plotted in 2 Dimensions

100-10-20-30

30

20

10

0

-10

Variable 1

Mean of Non-Defaulting Sample

Mean of DefaultingSample

Rejected

Accepted HighDefaults

Accepted Low Defaults

Page 17: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

experiencing the highest default rates, appear to be clustered together. Itis a confirmation of the basic assumption of predictive credit modelingthat such clusters emerge (similar borrowers behave similarly). The fea-tures measured by the two variables actually do reveal some similaritiesthat translate into a predictive ability.

It should be pointed out, however, that within almost everygroup of borrowers — even those in the low-default-rate groups — therewere borrowers who defaulted on loans. Similarly, within even the groupsshowing the highest default rates there were loans that did not default.In fact, defaulting borrowers are so thoroughly mixed in with nonde-faulting borrowers that it would be impossible to ever state with credi-bility that such and such a loan will, with certainty, default; or will, withcertainty, not default. These figures simply show trends and tendencies,that are measured only with different probabilities.

It is also apparent from Exhibit 7 that the shape of these popula-tions is rather complex. The gap between the population rejected forlending and that granted loans is shaped by the underwriting criteria.However, the population of those granted loans with the higher and lowerdefault rate is shaped by the vagaries of the variables themselves. Itshould be apparent that the borders between the higher- and the lower-default-rate groups are not smooth. (Although the criterion for dividingbetween high and low default rates was arbitrary here, at virtually anydecision rule the same jagged borders are observed.)

SSaammpplliinngg DDeeppeennddeenncciieess ooff CCrreeddiitt SSccoorreessThe protrusions and evaginations of the populations of higher- and lower-default-rate borrowers create another problem, one of sampling depen-dencies. As was illustrated in Exhibit 7, the contours for any givendefault rate within the borrower population are not smooth, as is theassumption with traditional credit scoring. Instead, there are bumps andvalleys along the various variables. This has a tremendous effect on thetraditional credit score development.

Exhibits 8a and 8b illustrate a population of potential borrowers,those to the lower left, marked with a circle, will have the higher defaultrates. Those to the right, marked with a cross, will have the lower defaultrates. If the underwriting criteria are such that the group indicated inExhibit 8a is granted loans, then the population of defaulting borrowerswould have a mean marked by point BBa, and the nondefaulting borrowerswould have a mean marked by point GGa. This sampling of defaulting andnondefaulting borrowers would give rise to a discriminant function CCaa

with credit score function ��aa

18 Subprime Consumer Lending

Page 18: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

19Chapter A: Credit Scoring and Risk-Adjusted Pricing

Exhibit 8a: Credit Score as Function of Data Sampled —Larger Sample from Less Conservative Underwriting

Variable 1

Ga

Ba

DiscriminantFunction

Ca

�aCredit ScoreFunction

Exhibit 8b: Credit Score as Function of Data Sampled —Smaller Sample from More Conservative Underwriting

Variable 1

Gb

Bb

DiscriminantFunctionCb

�bCredit ScoreFunction

Page 19: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

If, on the other hand, the underwriting criteria were such thatthe group indicated in Exhibit 8b were granted loans, then the populationof defaulting borrowers would have a mean marked by point BBbb, and thenondefaulting borrowers would have a mean marked by point GGbb. Thissampling of defaulting and nondefaulting borrowers would give rise to adiscriminant function CCbb with credit score function ��bb. The different cred-it scoring functions ��aa and ��bb are an artifact of the underwriting.

In fact, as underwriting changes and even as market penetrationchanges, credit score functions will change. The population from whichborrowers are drawn exerts a tremendous influence on the “direction” ofthe credit scoring function. This is an unfortunate artifact of the tradi-tional scoring framework. As will be shown below, however, the secondgeneration techniques of risk-adjusted pricing have overcome theseshortcomings.

SSiinnggllee--DDiimmeennssiioonnaall SSccoorree::SSiimmpplliicciittyy vveerrssuuss LLoossss ooff IInnffoorrmmaattiioonnIt is useful to mull over one of the underlying premises of the traditionalcredit-scoring approach, that there is a single direction in the borrowervariable space that captures the essence of being a good borrower. Thelines ��aa and ��bb in Exhibits 8a and 8b and the accompanying discussionearlier in this chapter illustrate the rigidity of this concept. Provided thatthe assumptions about the single mean for the defaulting and nondefault-ing populations, their normal distributions, etc., are roughly true, thissingle direction should generally point in the direction that maximizesdiscrimination between the defaulting and the nondefaulting populations.

However, even with these assumptions there is an obvious sim-plification made when all borrowers are aligned and projected onto a sin-gle dimension. Exhibit 9 shows two groups of borrowers, VV and WW, whichlie at the same “credit score” value (both on the same perpendicular lineCC) but which might have very different group characteristics.

The simple credit-scoring model compresses all of these groupsonto the same one-dimensional measure (CC) and ignores any differencesamong them.

The loss of information when all groups are projected onto a sin-gle dimension has been studied, and information theory provides a mech-anism for measuring this loss.

Information has a formal definition that can be seen intuitively. Ifone thinks of the variability in the original borrower data (that is, the 50 or100 variable measures on each borrower, which we can call the original bor-

20 Subprime Consumer Lending

Page 20: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

rower space), and how that variability is preserved as one attempts to mapthese data back and forth into another space. For example, one could mapthe borrower data from the original borrower space into the one-dimen-sional credit score space (using the credit score function ), and then attemptto map back into the original borrower data space. One can compare thevariability in the borrower data after performing this mapping, back andforth, to the original variability in the data prior to such mapping.

Mathematically, one can determine the maximum amount of infor-mation that could possibly be preserved by mapping from the original bor-rower space to any single dimension by measuring the ratio of the greatesteigenvalue of the borrower covariance matrix to the rank of that matrix.These are mathematical terms, perhaps forgotten or barely learned bymost, that may best remain in the domain of statisticians. The reader neednot worry that any portion of this chapter, except for the concluding sec-tions on mathematics, requires an understanding of these terms.

For one specific (and rather typical) population of 93,000 borrow-ers, with 39 variable measures, this process showed that at most 12.32%of the information about the borrowers could be preserved by mappingonto a one-dimensional space.

This suggests that a credit score, while simple to calculate andgenerally useful in discriminating between a defaulting and nondefault-ing population, does so with great loss of information. The simplification

21Chapter A: Credit Scoring and Risk-Adjusted Pricing

Exhibit 9: Two Multivariate Populations

Variable 1

C

Direction of IncreasinglyConservative Decisions��

Group “V” of Borrowers

Group “W” of Borrowers

Page 21: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

of the problem into the typical credit scorecard will toss out the majorityof the information about the borrowers (approximately 88% in the exam-ple cited). The resulting, simplified measure of the borrower fails to dis-criminate between or among many borrowers who may, in fact, be verydifferent from each other.

SSuubbpprriimmee LLeennddiinngg IIss IInntteerriioorr--FFooccuusseeddAnother difficulty associated with the typical credit-scoring model arises inits application to the nonprime borrower. The traditional credit-scoringtechnology was developed in the days when discriminating between thebest borrowers and the worst borrowers was the required task. The mar-ketplace has changed, however, so that today’s subprime lenders do notseek the best (prime) credits but, rather, acceptable credits. The typical useof a credit score among subprime lenders is to make subtle discriminationamong borrowers who are all nonprime, attempting to find neither thosethat are extremely good nor extremely bad but, simply, to find those whoare somewhere in between — who will not default as frequently as others.

The traditional credit-scoring models are based on the premisethat one is intending to maximize the likelihood of distinguishing betweengood and bad credits. And as indicated, too often this is not the use towhich they are put. Consequently, when one attempts to use these stan-dard credit models in subprime lending, it is a bit like using a socketwrench to drive a nail or a hammer to remove a bolt. While the modelsmay work in a crude sort of way, they are simply using the wrong tool. Asshown below, however, the proper tools do exist and they are very effec-tive at solving the problems for which they are designed.

MMoonnoottoonniicciittyy:: AA DDaannggeerroouuss GGeenneerraalliizzaattiioonnOne final assumption implicit in the traditional credit scores is that thecredit function is essentially monotonic (or smoothly progressing fromthe worst to the best credits). While it might be easy to observe that allborrowers with a score of 700 are better credits than those with a scoreof 400, the models also predict that all borrowers with a score of 550 arebetter credits than those with a score of 545. Experience, unfortunately,tells us this is simply not the case.

Instead, experience tends to indicate that credit scores in themid-range are not very predictive. In other words, in the precise rangesought by nonprime lenders, credit scores do not foretell accuratelywhich loans will perform better than others. The credit scores are helpfulin identifying the borrowers as nonprime but, beyond that, it is difficultto make further discrimination.

22 Subprime Consumer Lending

Page 22: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

The credit score’s focus on extrema, as explained above, is onereason that this occurs. A good example might be seen in a variable likenumber of dependents. The typical credit score would most likely givesome positive weighting to this variable (such as adding three points foreach dependent child). However, experience has shown that some cir-cumstances (perhaps a single parent, marginally employed borrower)may suggest just the opposite. The credit score’s immutability attemptsto use what is generally true in predicting the performance of every case.

The inability of credit scores to take into account special circum-stances, especially those that seem intuitive with actual lending experi-ence, has made many users wary of incorporating this sort of predictivemodel. As nonprime lenders come to realize, each type of borrower isreally a different special case, and the monotonic tools that work for iden-tifying the best borrowers are actually limited in their predictive abilities— in just the range that is most needed.

TTEECCHHNNOOLLOOGGYY AADDVVAANNCCEESS:: PPRREEDDIICCTTIIOONN BBAASSEEDD OONNSSIIMMIILLAARRIITTYY AANNDD AACCTTUUAALL HHIISSTTOORRYYTechnology to overcome the limitations of credit scoring, as describedearlier, has become available as the cost of computational power has fall-en. It is now feasible to approach predictive credit assessment from anentirely new, but inherently simpler, perspective. This new approach,known as data mining, has actually been made possible by advances indata-processing techniques. The presumption of data mining is that thebest method for predicting performance of a loan is simply to use the his-tory of previously made, similar loans

The actual methods used may vary but, essentially, when a pre-dictive assessment of the creditworthiness of a loan is sought, an enquiryis made to the loan history data base. A group of similar loans is identi-fied, complete with their history since origination. The summarized infor-mation of these peers is used to predict the behavior of the new loan.

The prediction of each new loan’s behavior, with these tech-niques, is based not upon a theoretical relationship developed from a sam-ple of good and bad loans (as is traditional credit scoring) but upon actu-al experience. The advantages of this approach are numerous.

MMeeaassuurreemmeenntt UUssiinngg tthhee NNeeiigghhbboorrhhoooodd AApppprrooaacchhThe major conceptual hurdle to overcome when modernizing one’s viewof credit assessment is really nothing more than a change of perspective.Instead of grouping all defaulted loans together and seeking the averagelocation of defaults (as is done with traditional credit scoring), one groups

23Chapter A: Credit Scoring and Risk-Adjusted Pricing

Page 23: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

borrowers into something akin to a neighborhood and determines fre-quency of default (and other measures of loan performance). Thus, thetask becomes a two-step process of aggregating into groups, then mea-suring the attributes of those groups.

Although the aggregation of borrowers into groups, as a matterof practical implementation, precedes the measurement of the attributesof the various groups, the discussion of grouping techniques will bedeferred. Instead, we focus on conceptual issues first and leave the dis-cussion of the techniques actually used to the final section of this chapter.

Imagine the original, two-population view used in developing thetraditional credit score model, as pictured in Exhibit 10.

Instead of dividing the population into two groups — those thatdefaulted and those that did not — and then trying to determine the loca-tion of greatest concentration of each group (the credit-score method),imagine that we arbitrarily divide the population into several smallgroups (without respect to their loan performance). Exhibit 10 shows thisdivision with the grid overlaying the borrower distributions. (In reality,the subdivisons are much, much smaller.)

24 Subprime Consumer Lending

Exhibit 10: Two Multivariate PopulationsGridlike Division into Smaller Populations

Variable 1

Page 24: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

Now, we actually measure what has occurred historically withineach group (or neighborhood). We make no assumption about there beingonly two populations (good and bad credits) nor do we presume that thereis a smooth transition from good to bad. We do not even have to limit ourquestions to default rate. We simply find the group of borrowers aboutwhich we are interested and then measure the attributes of concern.

When a new borrower is presented, this methodology simplyidentifies the neighborhood to which the borrower belongs and thenreports the expected behavior of that borrower’s group. It is simply theopposite perspective of the traditional approach (group, then measureversus the old, divide by defaults, then find division function). Thisapproach sets us free from some of the troubling assumptions and limita-tions of the older methods.

As will be shown, this methodology is enormously useful withsubprime populations. In practice, it also proves very interesting withprime borrower populations (which are well served by traditional creditscores) because multiple attributes can be measured.

DDeeffaauulltt SSuurrvviivvoorrsshhiippAs we recall from the development of the traditional models, a credit scoredefines a measure of discrimination between two presumed simple popu-lations; those that default on loans and those that do not. Unfortunately,that is not what subprime lending is all about. Subprime lenders seek topredict the cash flows that will result from a loan and make an economicdecision about whether or not to fund it, depending upon the timing andmagnitude of the loss expected. While a credit score may be related tothese aspects, it is simply an altogether different animal.

Using the neighborhood approach, each group that has beendefined is measured. There may be nine groups or there may be nine hun-dred groups of borrowers defined. Each of these is examined and the pro-file of actual defaults within the group is assessed. Usually, this mea-surement is performed with respect to timing of default and percentageof loans that have survived default through each period.

Since the loans that fall into each of these neighborhoods haveusually been originated at different times, a rough function is calculatedthat identifies the percentage of loans of age tt or older that have sur-vived. Exhibit 11 illustrates such a function.

In calculating this function, one normally observes that there aremany loans of age one month or older, a bit fewer of age two months,fewer still of age three months, and so forth. This is because a portfolio(from which one measures) is dynamic. Often, after a certain age, the

25Chapter A: Credit Scoring and Risk-Adjusted Pricing

Page 25: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

number of loans observed that are of that age or older becomes verysmall. When the number of loans for such time period is small, the esti-mate of the percentage that survive to that age becomes unstable.Consequently, several compensation techniques have been developed.

What is important to note, however, is that each neighborhood ischaracterized by its own individual default survivorship function. Thus,when a new borrower is presented, the neighborhood to which it belongsis examined. The precise function of defaults is determined by the histo-ry of other, similar borrowers and this default function is applied to thenew borrower as the best predictor of that borrower’s performance.

OOtthheerr MMeeaassuurreessJust as expected default frequency is adjusted for each borrower basedon the history of his or her own neighborhood, other measures are alsoadjusted. Often, there are delays associated with repossessing collateralor variations in collateral condition that manifest significant changeswithin borrower neighborhoods.

Using the neighborhood method, virtually any measure of inter-est can be determined for a group of borrowers. The changes in valuessometimes appear to vary systematically from one part of the populationto another and, sometimes, the measures appear to have local clusters ofvalues with higher values centered in particular areas. This becomesquite useful when making predictive models. For each neighborhood the

26 Subprime Consumer Lending

Exhibit 11: Actual and Estimated Monthly Default Rates

0.0

0.5

1.0

1.5

2.0

2.5

3.0

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Month

De

fau

lt R

ate

%

Actual

EstimatedActualEstimated

Page 26: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

future behavior of each loan is predicted by the history of each of theloan’s peers within that cluster.

This ability to assess each measure of interest separately and todo so independently of any predetermined assumption — as is the casewith traditional credit scoring — gives this second generation of predic-tive models extraordinary versatility. This category of tools is usuallycalled data mining, because the history for each group is mined out of theexisting historical records of similar loans.

AAvvooiiddiinngg tthhee PPiittffaallllss ooff MMoonnoottoonniicciittyyAmong the greatest advantages of this data-mining approach is elimina-tion of the assumption of monotonicity. Instead of presuming that thereis a smooth transition from the best credits to the worst, this approachrecognizes that relationships among the borrower variables may vary. Atsome income levels, the debt-to-income ratio may be more predictivethan at others or the number of delinquencies on previous debt may bemore predictive.

This approach eliminates the relationships that are hard-wiredinto a credit score, such as: $300 of additional income increases the cred-it score by the same amount as six months of additional time of employ-ment. There is no presumption of any consistent relationships among theborrower variables. Instead, the simple historical performance of similarloans provides the framework for estimating the future performance of anew loan.

Assessment of credit by identifying a group of similar loans andusing their performance as a guide is a very pragmatic approach to pre-dicting behavior. The primary assumption is reduced to that basic gener-alization of credit prediction: Similar borrowers tend to behave similarly.The problem is simplified into one of assessing only the similaritiesamong borrowers.

Conceptually, this is very much like the approach used in apprais-ing the value of real estate. Within a particular neighborhood, compara-ble properties are examined to determine such factors as sale prices andincome-producing capabilities. In this approach to credit assessment,however, comparable borrowers are sought whose history is understoodand they are examined to provide insight about credit-related behavior.And, just as in real estate appraisal, the selection of those to whom thecomparison is made is critical to appropriate valuation.

27Chapter A: Credit Scoring and Risk-Adjusted Pricing

Page 27: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

TThhee NNeeww PPrroobblleemm:: AAsssseessssiinngg BBoorrrroowweerr SSiimmiillaarriittyyRecalling the fundamental premise of all credit modeling, that similar bor-rowers behave similarly, it can be seen that increasingly sophisticatedtools are being used to identify similar groups of borrowers which then areused to predict credit performance. Perhaps the most promising advancesin this area deal with multidimensional representations of borrowers.

Similarity, as has been used intuitively up until now, has reallymeant proximity in the variable space of the borrowers. A distance mea-sure between two points is directly substitutable for the concept of mea-suring how alike two borrowers might be. When both borrowers have acredit score of 574, they are considered alike. However, that single mea-sure of credit score has limitations, as developed earlier; the predictivemodels used in subprime modeling expand on this concept of similarity.

It should be pointed out that a precondition to comparing vari-ables which have widely different values and ranges (as in income, whichis measured in thousands of dollars, and income ratios which are mea-sured in fractions) the variables are usually normalized. This most oftenmeans that from each variable the average of that variable is subtractedand the difference is divided by its standard deviation. This makes allnormalized variables with a mean value of 0.0 and a standard deviation of1.0, which are more readily comparable with each other. Each normalizedborrower variable is then an indication of how many standard-deviationunits that value is above or below the average value for all borrowers.

The second-generation credit approaches usually begin with theacknowledgment of the shortcomings of a traditional credit score and thesubsequent development of another measure of something else that seeksto incorporate additional borrower variables. An example might be seenin those companies that have built and used a general credit-scoringmodel that utilizes certain variables on a loan application. But, realizingthe additional information about the borrower that is provided by anextensive credit report, which is not fully utilized in the original creditscore, they build another credit report score that incorporates much ofthis otherwise unused information. These companies then use both theoriginal credit score and the credit report score to predict borrowerbehavior more effectively.

Because of the additional information that is utilized in predict-ing borrower performance, these two-dimensional or three-dimensionalmodels often significantly enhance lenders’ forecasting ability. However,the ad hoc addition of supplemental credit scores does not always assurea better predictive model.

The general approach of multidimensional representations of

28 Subprime Consumer Lending

Page 28: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

borrowers and multiple measures of credit performance has been shownto be a very productive and an exciting direction in credit modeling. Thebest approaches rely upon a more rigorous approach and certainly pointthe way for future credit-score development.

CCoorrrreellaatteedd VVaarriiaabblleess aanndd IInnffoorrmmaattiioonn RReetteennttiioonnThe specific problems that credit scores, in general, seek to overcome,and that multidimensional models of borrowers solve, deal primarily withtwo related issues. The first is that in trying to represent borrowers insome meaningful way using credit variables obtained at loan origination,the serious statistical problem of nonindependence arises: Certain bor-rower variables tend to be highly correlated. The second problem is thatin eliminating some of the redundancy in the correlated measures, infor-mation can often be lost.

If a distance or similarity measure between two points is takenas its simple Euclidian distance (the square root of the sum of the squareddifferences of each variable), then adding a new variable to the distancemeasure will change the estimate of similarity. If that new variable issimply a restatement of an existing variable, the distance measure willincrease even though no new information is added by that variable.Although this argument has a mathematical basis, the redundancy prob-lem can be examined from a more intuitive basis.

The problem of nonindependence among variables is primarilythought of as a messy statistical nuisance, but it complicates even a con-ceptual understanding of the data. Variables that are not independentchange together; as one variable changes, the likely range that anothervariable takes on tends to change as well. Models that utilize only one ofthese variables might identify effects associated with that variable but, iftwo nonindependent variables are used for prediction, the effect tends toget blurred between the two variables. This is analogous to the relation-ship between number of points scored in an intramural basketball leagueand height of a player. One might notice the same relationship betweenpoints and the vertical reach of the player when jumping. Since height ofplayer and jumping reach tend to be related, one might reasonably havedifficulty determining whether the scoring relationship is due to heightor jumping reach or some combination of the two.

Some companies, for example, develop measures of borrowerincome. One of these measures might be monthly income as reported bythe borrower; another might be the verified income from the under-writer, who calculates income only by using specific rules. A credit mod-eler might argue that reported income and verified income both measure

29Chapter A: Credit Scoring and Risk-Adjusted Pricing

Page 29: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

the same thing and only one of them ought to be included in the creditmodel. This would, however, be wasting the information provided by thedifferences between the two variables. Indeed, both income variables dotend to measure the same thing (and hence are highly correlated), butcredit evaluators also value the slightly different information provided byboth measures and often include both in a credit model.

The solution for the problems associated with highly correlatedor nonindependent measures are easily resolved mathematically, but canalso be seen with the following simple, intuitive example. One can arguethat two variables are required to utilize fully the information in two cor-related but slightly different variables. The credit modeler could createtwo new variables; the first might be the average of the verified incomeand the reported income and the second might be the simple differencebetween them. The first of these variables represents the major attributepurported to be measured (that is, how much this person makes). Thesecond of these variables represents the disparity between the two mea-sures (a reality check of the borrower’s perception of his income).

In this example, it is important to note that the two highly cor-related income variables are easily represented with a single, compositevariable reflecting the primary attribute of both, but only with the loss ofinformation. In order to retain all of the information, another variableneeded to be added — that part of the information not conveyed in thesingle, composite representation.

In general, many of the borrower variables tend to be highly cor-related. For a statistician, the composite variables used to representthese correlated measures are called principal components.

CCoonncceepptt ooff aa PPrriinncciippaall CCoommppoonneenntt SSppaacceeIn the earlier example there were two different measures of borrowerincome: that reported by the borrower on the loan application and thatdetermined by the underwriter after investigation. Exhibit 12 representsa typical set of data for these variables and illustrates the composite orprincipal-component variables discussed earlier. It is with these principalcomponents that the basis of similarity measures among borrowers areestablished.

The first of the principal components in Exhibit 12 is the line PP11,which follows along the major axis of the sample of these two incomemeasures. This line, PP11, can be seen to capture the general information inboth variables. The data plot shows that as reported income increases,the underwritten income tends to increase, and this general relationshipis captured by the line PP11. Since it captures the primary relationship

30 Subprime Consumer Lending

Page 30: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

between the income measures it is usually considered the “first” princi-pal component of the variables.

This first principal component is related to the correlationbetween the two variables. The slope of this line PP11 is very similar to thatwhich would be predicted by other statistical methods such as regressionanalysis, which would attempt to extract the relationship between thesetwo variables. Principal component analysis is a much more general tool,though, than regression, in that it can be used to extract the relationshipsamong several variables at once, without some of the assumptions ofregression. When the data sample includes more than just two measures,the principal components are determined by the correlations among all ofthe variables. The first principal component might then be thought of aspointing the direction of the major correlation among all variables.

As the major relationship of these two income variables is repre-sented by the first principal component, a name is usually given thatreflects as much. The first principal component in this example of Exhibit12 might thus be referred to as the income factor or something similar.

As can also be seen in Exhibit 12, there is another line, PP22,, that isperpendicular to line PP11. This corresponds to the second principal compo-nent. This (and any subsequent components extracted from higher-dimensional problems) represents that portion of the overall data vari-ance not explained by the first principal component.

31Chapter A: Credit Scoring and Risk-Adjusted Pricing

Exhibit 12: Reported versus Underwritten Income

1 2 3 4 5 6 7 8 9 10

ThousandsReported Income

0123456789

P1

P2

Page 31: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

CCoonncceeppttuuaall IIssssuueess iinn FFaaccttoorr WWeeiigghhttiinnggssUsusally, in principal component analysis, the borrower variables arerepresented in a table with a factor associated with each of the principalcomponents. An actual factor-weighting matrix which has been sorted bycoefficients might be as represented in Exhibit 13. The data in this exhib-it are sorted so that the variables are in descending absolute magnitudewith respect their coefficient labeled PPCC11. Only the first six, largest prin-cipal components (PPCC11 -- PPCC66)) are presented in this exhibit, although 13, inactuality, exist.

Each borrower variable (in its normalized form) is multiplied bythe coefficients in the matrix and the column sum provides the value ofeach principal component. The column PPCC11 in this exhibit supplies thecoefficients for the first principal component, PPCC22 for the second, and soon. In this way, the 13 variables associated with each borrower allow theborrower to be mapped into the principal component space. This princi-pal component space has the desirable property of each dimension beingorthogonal, or perpendicular, to every other dimension. Thus, distancemeasures for similarity have meaning.

In Exhibit 13, the row “eigenvalue” indicates the relative impor-tance of that principal component in describing the overall variance of theportfolio. In this example, the value associated with PPCC11 is approximate-ly 3.3, and it accounts for about 25% of the total variance in the data. Thesecond principal component accounts for approximately 12%, and so on.

The coefficient for each variable in each principal component isrelated to a factor weighting. For these purposes, a factor weighting canbe thought of as the weight, or importance, of each of the borrower vari-ables in a principal component or factor. These weights are then used tohelp interpret or attach a meaning to each of the principal components.As indicated earlier, the variables in the Exhibit 13 are sorted by theabsolute magnitude of their weightings in PPCC11.

It can be seen that PPCC11 has a very high weighting on the variableof verified gross monthly income; all of the other variables have signifi-cantly lower weightings. This would lead an analyst to identify PPCC11 assimply a restatement of that one variable. PPCC22, however, has relativelyhigh weightings on a combination of variables: verified gross monthlyincome (once again), cash down-payment and gross monthly income asreported by the borrower. Thus, an analyst seeking to interpret or put aname to PPCC22 might call this something like borrower liquidity. PPCC33 hashigh weightings in variables associated with age, length of residence, andemployment stability, so an analyst would more likely than not give this

32 Subprime Consumer Lending

Page 32: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

compo-

ne

nt

an

inte

r-p

reta-t

io

nsuch as“

bo

r-

row

er

stabili-ty.”

33C

hapter A:C

redit Scoring and R

isk-Adjusted P

ricingExhibit 13: Sorted Matrix of Principal Component Coefficients

Name

These data comprise a subset of variables from a sample analysis and the eigenvalues have been modifiedfor example purposes.

I D PC1 PC2 PC3 PC4 PC5 PC612 0. 52174000 0. 48751800 0. 00627757 -0. 03483180 0. 08380410 0. 13072800

6 0. 16060600 0. 03802040 0. 32072500 0. 18612900 -0. 24855400 0. 24466200

1 0. 12445100 0. 16953500 0. 48895700 -0. 03895760 0. 14511000 0. 128786003 0. 05925460 0. 07275670 0. 58234600 -0. 03651820 -0. 16853800 -0. 01605290

2 0. 05520240 -0. 00042608 -0. 03719940 0. 46557500 -0. 00847272 0. 16164700

11 -0. 03975870 0. 07957660 0. 02504140 0. 02620630 -0. 10148700 0. 12900700

9 0. 03835860 0. 36421200 -0. 09298220 0. 06017860 -0. 06899970 0. 04886870

4 0. 02890860 0. 15408600 0. 53678000 0. 03271890 0. 58222700 -0. 09446200

5 0. 02736050 0. 17886100 0. 49299400 -0. 00355669 0. 42912300 0. 03946330

10 -0. 02062370 0. 31272600 0. 03990360 0. 03179590 0. 10827000 0. 10466900

8 0. 00819672 0. 00906224 -0. 07380480 -0. 07637980 0. 13626000 0. 70543500

13 -0. 00762389 -0. 01772890 -0. 00920652 -0. 00131319 0. 00533494 -0. 00168223

7 -0. 00139017 -0. 00367356 0. 16182200 0. 07437100 0. 41152000 -0. 23312800

3. 27592546 1. 59581089 1. 12245029 0. 87204431 0. 84775754 0. 76458779

25. 1994 % 12. 2755 % 8. 6342 % 6. 7080 % 6. 5212 % 5. 8814 %

Verified Gross Monthly Income

Owns Residence

Age of Borrower

Months at Current Residence

Number of Dependents

Gross Other Income

Cash Down-Payment Amount

Months at Current Employment

Length of Time This Line of Empl

Gross Monthly Income

Mailing Address Same as Res

Verified Debt Ratio

Works Full-Time

eigenvalue

Page 33: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

For each principal component, an analyst usually finds a name tohelp interpret the factor or general borrower characteristic that themathematics are describing. The casual observer of this exhibit can, withonly modest effort, attach somewhat meaningful names to the compo-nents and gain an intuitive feel for what that component measures. Thetrue benefit, however, of the principal components is that each of thesecomponents is independent of every other principal component, and dis-tance measures between points plotted in this principal component spaceare well behaved mathematically.

AAnn EExxppaannssiivvee VViieeww ooff BBoorrrroowweerr SSiimmiillaarriittyyOne of the benefits of having a rich mathematical basis for determiningborrower similarity is that extensions to the categorization model areeasily made and understood. One particularly valuable extension to thesimilarity model arises when looking at seasoned portfolios. Those famil-iar with Bayesian statistics might feel more comfortable than those withonly a traditional statistical training when jumping so easily between andchanging the definition of a priori and a posteriori events. If the readerfeels uncomfortable about this jump, please bear with the following argu-ments and consider the approach on its merits.

In a seasoned portfolio, say one that has all loans of age sixmonths or older, the performance of a loan during this common ageingperiod may provide loan grouping information that is useful in predictingfuture credit performance. One common practice is to examine a loan’sfirst-month payment record. Those loans that become 30 days delinquenton the first payment are often intuitively grouped by portfolio purchasersand excluded from purchase.

The mathematics of the approach described here allows severalmeasures of borrower behavior during the ageing period (that the loanshave in common) to be used quite simply in these models. The behavioralmeasures are applied, just as underwriting variables are used, in apprais-ing the similarity of loans. Then, with these additional similarity mea-sures, the historical performance of each loan neighborhood (from thatpoint forward in time) is assessed. Different default rates and delinquen-cy rates are often observed in neighborhoods defined, in part, by theseearly behavior measures.

It is the flexibility of this data-mining approach that allows eachdefining characteristic of a loan to be used, as might be appropriate, inobtaining a better and better estimate of expected borrower performance.Questions may be posited for subpopulations of loans — such as: What canbe expected of loans aged 40 months or more? — which rely on the histo-

34 Subprime Consumer Lending

Page 34: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

ry of loans that have already aged beyond 40 months. Then again, thesame question can be asked of loans aged 30 months or more. In each case,the behavioral predictors that occur prior to that 40th or 30th month canbe used as additional variables in assessing the similarity of loans.

One example of a feature that has been used to aid in groupingloans is associated with delinquency type. Loans, for example, with a pay-ment lapse of 60 days or more within the first N months of their life aretagged and event frequencies counted (N, of course, being chosen asappropriate for the population being examined, as described in the pre-ceding paragraph). Each occurrence is identified as either a lapse withrecovery (delinquency followed by a catching up in payments so that theNth month’s payment due is received by month N) or a lapse with norecovery (delinquency followed by no catching up, and only the N-2nd

month’s payment due, or fewer, has been received by month N).The number and category of each type of delinquency aids in

grouping seasoned loans and enhances predictive ability for the remain-der of their life. (For example, 24-month-old loans with two or more 60-day lapse-with-no-recovery delinquencies typically have a higher expect-ed default rate than those with no 60-day delinquencies during their first24 months, and so forth.)

The benefit of this approach is the unlocking of the richness ofinformation contained in historical payment files. When properly set upand repeatedly updated (with each month’s history added), thisapproach leads to a dynamic and self-correcting estimation system forloan performance. The calculations are intense, but the value ofincreased predictive ability usually outweighs the cost of the computerand analysis.

LLooaann GGrroouuppiinngg aanndd NNeeiigghhbboorrhhoooodd MMeemmbbeerrsshhiipp TTeecchhnniiqquueessThe aggregation techniques alluded to in the preceding sections are intu-itively simple, but actually are quite challenging in implementation. Thegrid methodology that was used in describing the neighborhoods ofFigure 10 suggests that the objective of these techniques is to establishsome boundary within which loans are considered sufficiently similar tobe grouped. Then, as indicated in the previous sections, these groupedloans are measured for performance attributes (such as default rates) andtheir group measure is used in some predictive model.

The concept of “sufficiently similar” is the challenge. On an intu-itive basis (the actual implementation is described in the section on“Mathematical Aspects: Nearest Neighbors Clustering”), we might seekto choose either some set of loans within a specified boundary in the bor-

35Chapter A: Credit Scoring and Risk-Adjusted Pricing

Page 35: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

rower space or a set of, say, K of the most similar loans to a point, and usethat set of loans as the definition of the neighborhood.

It might be appropriate to begin with a gridlike view of the prob-lem. It would be possible to look at each loan and determine the maximumand minimum value observed along each dimension. (Since we are usinga principal component space, these will be referred to as PC1, PC2, and soforth up to PCd — for d as the number of dimensions.) The range of val-ues between PC1max and PC1min might be divided into some number of“bins,” say 10, and loans falling into each bin of PC1 could be grouped. IfPC2 is also included, then the 10 bins of that dimension together with the10 bins of PC1 would make a 10-by-10 grid and loans falling into each ofthe 100 bins would be grouped.

So far, this approach might be seen as tractable. Certainly, someof the grid squares might be sparsely populated, others might be dense-ly populated. If that were the only problem, something might be workedout to normalize the grid boundaries to distribute the borrowers moreevenly within the grid. However, consider adding PC3 and its 10 bins tomake 1,000 boxes of 3 dimension, and then PC4 with its 10 bins to make10,000 hyperboxes of 4 dimension.

It can quickly be seen that neighborhoods defined with a grid-type approach will soon become unwieldy. The number of neighborhoodsusing this method for d dimensions will be 10 raised to the dth power.Even if 10 bins were not used, but only 2 values for each variable, thissystem becomes unusable because the number of grid-neighborhoodsgrows exponentially with the number of dimensions.

Instead, the preferred approach is to chose the K nearest neigh-bors to a point as its defined neighborhood. This is illustrated in Exhibit14 where each point is clustered with its nearest neighbors into a neigh-borhood, M1, M2, etc. If an originator wishes to make a prediction of a newborrower’s performance, the K nearest neighbors to that new borrowerwould be identified as its defined neighborhood.

The value of K might be chosen for statistical significance. If Kwere set at 300, for example, the neighborhood of a new loan, and thebasis for predicting how that new loan will behave, is the 300 nearestloans in the database to that new loan. Typical values of K are usuallybetween 100 and 500. If too few loans are chosen to comprise the neigh-borhood, some estimates may be difficult to make with precision. Thelarger K, the smaller the number of neighborhoods and the less specificthe predictions for the particular borrower. This is a fundamental trade-off that must be considered when using this approach.

36 Subprime Consumer Lending

Page 36: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

For example, if the typical historical database of loans originatedover several years is used, the following problem arises. A neighborhoodof, say, 100 nearest loans is selected, and all 100 loans have a payment his-tory variable (or conversely, a measure of whether they defaulted) intheir first month of life. However, a few of the loans in the database wouldbe only one month old, so the number of loans with a second-month pay-ment history variable will be smaller than 100.

Similarly, because many are yet too young, fewer of the loans willhave a payment-history variable for month 4, fewer still for month 20,and fewer still for month 40, and so forth. The number of loans selectedfor a neighborhood must be adequate to provide enough observations atthe point of interest (expected defaults in months 48 to 54 of loan life, forexample).

In general, the distance technique has been shown to be superiorto other methods of determining neighborhoods. Consequently, in data-mining applications such as this, one often speaks of nearest neighbors asthe grouping technique. Even this (as is shown in “Mathematical Aspects:Nearest Neighbor Clustering”) is a computationally difficult approach,but one which is tractable with appropriate algorithmic precautions.

37Chapter A: Credit Scoring and Risk-Adjusted Pricing

Exhibit 14: Clustering by Nearest Neighborhoods

Variable 1

Var

iab

le 2

Page 37: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

MMAATTHHEEMMAATTIICCAALL AASSPPEECCTTSS::TTHHEE LLIINNEEAARR DDIISSCCRRIIMMIINNAANNTT FFUUNNCCTTIIOONNThe standard credit score problem can be approached as a linear dis-criminant function for classifying an observation of an unknown popula-tion into one of two previously estimated populations (loans that default-ed, and loans that did not default).

Assume that two random samples of size N1 and N2 are drawn asobservation vectors from two independent p-dimensional multivariatepopulations with mean vectors µµ11 and µµ22, respectively, and with a commoncovariance matrix ��. We wish to choose a vector c, of coefficients for a“credit score,” that acts as a one-dimensional index (or scale) for discrim-inating between the two populations by some measure of maximal sepa-ration. If we begin with a two-sample multivariate T2 statistic, with twosample means xx11 and xx22 and S as the pooled estimate of ��, we can seekthe coefficient vector c of our index c’x, that gives the greatest value of

or, which equivalently maximizes the absolute value | c’(x1 – x2) | subjectto the constraint c’S c = 1. Solving, using the Lagrangian multiplier �, wehave

and

where T2 is once again Hotellings familiar statistic. This system of equa-tions has the single solution

38 Subprime Consumer Lending

Page 38: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

and the linear function for discrimination is[A1]

(as the rank of this matrix is p –1) or simply, y = c’x.If the variables of each observation x are normalized (with a

mean of 0.0 and standard deviation of 1.0), then the elements of c indicatethe relative importance of the corresponding variable to the T2 statistic.However, for credit scoring, this is not usually done. Instead, the coeffi-cients c are scaled so that y in equation A1 produces a value in the rangenormally associated with a credit score.

MMAATTHHEEMMAATTIICCAALL AASSPPEECCTTSS::TTHHEE CCOOMMPPUUTTAATTIIOONN OOFF PPRRIINNCCIIPPAALL CCOOMMPPOONNEENNTTSSPrincipal components, as used in this chapter, make reference toHotellings methods for analyzing the correlation structure of randomvariables. For purposes herein, these variables need not be independent,nor normally distributed.

Assume that the p random variables x1, x2, …, xp to be observedhave a certain multivariate distribution with mean µµ and covariancematrix �� (estimated by x and S as in the previous section). We furtherassume that the rank, r, of �� is less than or equal to p, and the q greatestcharacteristic roots (or eigen values)[A2] �1 > � 2 > . . . > � q

of �� are all distinct. We will also presume that all of the observations arestandardized so that each sample mean xi = 0.0 and sample standard devi-

ation si = 1.0, and

The first principal component will be that linear combination

39Chapter A: Credit Scoring and Risk-Adjusted Pricing

Page 39: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

of the random observation x, whose sample variance of

is the greatest of all potential coefficient vectors cci normalized in length

such that ci’ci = 1.

Using these definitions and the associated constraints, we canfind, using the Lagrange multiplier l1 and differentiating with respect tocc1, a solution to the equation

whose coefficients will consist of p simultaneous equations

If l1 is to be anything other than a trivial solution, l1 must be chosen to bea characteristic root of the covariance matrix S. Since c1 is that coeffi-cient vector that maximizes the variance, l1 will be the greatest charac-teristic root, defined as �1 above, and c1 its associated characteristic vec-

tor (or eigenvalue and associated eigenvector, respectively). The coeffi-cients of c1 are unique only up to multiplication by a scale factor, and ifthey are scaled, as suggested above so that cc1’’cc1 = 1, then it is of interestto note that

or, the characteristic root �1 is simply the sample variance of the y1.In computing the greatest characteristic root and vector of an

observed matrix S, it is useful to recall that one of the attributes of char-acteristic roots is

[A3]

40 Subprime Consumer Lending

Page 40: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

Thus, if we arbitrarily create a sequence of vectors

and choose vv0 such that

then we can see that

and by equation A3

By the nth iteration, this becomes

or

Since �1 > �2 > … > �p, it is apparent as the number of iterations, n,increases, the terms of the right hand of the equation beyond the first willtend toward zero.

In practice, one can obtain the greatest characteristic (or eigen)vector by finding the sequence

and beginning the multiplication with higher powers of the matrix. Sincethe characteristic vectors are unique only to a multiplicative scale factor,

41Chapter A: Credit Scoring and Risk-Adjusted Pricing

Page 41: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

the standard technique is to divide, at each iteration, the resulting matrixby its largest element. Finally, when such normalized kth power of thematrix is less than some predetermined tolerance different from the(k+1)th power, the matrix is scaled such that length of c1 is 1, and by equa-tion A3, find �1.

To then find the second-largest characteristic root and vector theprocess is repeated on the reduced matrix

and so forth. Although this method works well enough for the first sev-eral characteristic roots, numerical accuracy soon degenerates.Consequently, other numerical methods are customarily used when cal-culating these smaller characteristic roots and vectors.

It should be pointed out that

is the vector of component correlations, and thus the vector c1 is ofteninterpreted in terms of the each variable’s contribution to the component.

MMAATTHHEEMMAATTIICCAALL AASSPPEECCTTSS::NNEEAARREESSTT NNEEIIGGHHBBOORRSS CCLLUUSSTTEERRIINNGGIn practice, each of the p variables of an observation x consists of a trans-formed and rotated variable, derived by subtracting the mean of eachraw variable associated with a loan or borrower, dividing by the raw vari-able’s standard deviation, then rotating the raw variable vector by atransform matrix consisting of the first p characteristic vectors of thecovariance matrix described in the previous appendix. Given an existingset UU of N observations, a distance measure di can be defined betweenpoints in this space. For example, one such measure might be simply

for any of the existing observations xi in UU and an arbitrary point x in this

borrower space. If these distances are ordered d1 < d2 < … < dN, then theset of k nearest neighboring points to the arbitrary point x is simply KK ={d1, d2, …, dk}. As commonly used for predictive modeling, each of theexisting observations xi has associated performance attributes, such as

42 Subprime Consumer Lending

Page 42: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

43Chapter A: Credit Scoring and Risk-Adjusted Pricing

time to default, payment history, etc. The attributes of the set KK are com-monly taken to be the average of the attributes of each point xi, i < k.

However, in implementation the number of comparisons requiredto repeatedly compute an N by N similarity matrix for large N (say, larg-er than 50,000) is impractical. Consequently, a hierarchical structure isusually imposed upon the observations.

Some number, N1, of observations is randomly chosen from U,with N1 < N (commonly N1 is chosen to be at least one or more orders ofmagnitude smaller than N), to constitute a set M1. Often these selectedobservations are referred to as “seeds” for “meta loans.”

The distances, dij, between a loan xi � MM1 and every meta loan

seed xj � MM1 are calculated, and the loan is assigned membership in a setKK(N1j) associated with the seed xj having the smallest distance dij. In thisway, each set KK(N1j) is considered a meta loan (or an aggregate of sever-al loans), and contains those loans nearest the meta loan seed.

When seeking to aggregate the k nearest neighboring points tothe arbitrary point x, one then makes only N1 comparisons, calculatingthe distance to each meta loan seed, and, beginning with the smallest dis-tance aggregates the members of the sets KK(N1j) until a sufficient numberof points have been obtained.

This algorithm (attributed to Spencer Kimball) can be applied atmultiple levels (i.e., KK(N1j(N2l)), etc.), and tuned for each data set to savean enormous amount of computing.

It is not common practice to use the simple measure describedabove to determine distances. Instead various weightings are applied toeach of the dimensions of p dimensions of x to take into account the char-acteristic root associated with each of the dimensions, the relevance ofthe dimension to that attribute of interest, and so forth. It is theseweighted distances that prove to be most useful in finding clusters ofloans for use in predictive models.

SSUUMMMMAARRYYThis chapter has examined predictive modeling of credit. It began with areminder that both the reliability and the validity of an approach needconstant examination. From that groundwork it has redeveloped the his-torical framework for quantitative models known as credit scores.

While simple credit scoring has been shown to be a useful tool,particularly in prime credit markets, it has been found wanting in someapplications that require differentiation among borrowers that are not atthe extremes (good or bad) of the credit dimension. The historical credit-

Page 43: 2 Subprime Consumer Lending C - delphianalytics.com · 2 Subprime Consumer Lending. P ... ephemeral, always just over the horizon, yet promising wealth untold. Fortunately, however,

scoring approach tends to lose validity when applied to marginal creditpopulations. When differentiating among various groups of borrowerswhich may not be the best of credits but are not so extremely bad that nocredit is extended, other methods have been found more aptly applied,derived from the general field of data mining.

These data-mining methods are shown to rely on the fundamen-tal assumption that similar borrowers tend to behave similarly. There arefew assumptions about the orderly transition of bad borrowers into goodborrowers. Instead a neighborhood approach is described that attemptsto cluster similar borrowers into small groups. The historical perfor-mance of these small groups of similar borrowers provide the foundationfor predicting the behavior characteristics of new borrowers.

With this method, the actual history of the K nearest neighborsto a new borrower is assessed. The default frequency curve, for exampleis determined by comparing the defaults of these K nearest neighbors tothe aggregate default frequency of the population as a whole. If it isgreater that the population average, then that new borrower is expectedto default with similar greater probability.

This approach, as developed in this chapter, results in a directprediction of behavior, rather than the assignment of an intermediatecredit score. Such direct prediction allows multiple measures to be used,in addition to the standard default prediction of credit scores. One furtheradvantage of these new data-mining methods is that the technique lendsitself to non-monotonic changes in credit quality. It permits a focus on theinterior ranges of credit quality, rather than focusing only on theextrema.

As presented in this chapter, new techniques have been devel-oped specifically for making credit predictions about the subprime bor-rower. These methods provide for increased predictive ability, preciselywithin the population where the previous credit models have been mostdisappointing. As these methods are more broadly applied and morewidely understood within the subprime market, we can expect greaterand greater confidence in this market’s credit forecasts. To the extentthat this confidence is shown to be merited, we can expect a reduction inthe harsh judgments levied by the market in valuing subprime business.

44 Subprime Consumer Lending