datamining-solutions validation of rating system 1

Datamining-Solutions

Validation of Rating System

1

Datamining-Solutions2

Outline

What is validation of rating systems?

Two components of validation of rating systems

discrimination,

calibration.

Discrimination methods

Calibration methods



Having set up a rating system, it is obvious that we wants to assess its quality.

Validation of rating system rely on using a lot of distinguish methods to assess its quality.

There are two dimensions along which rating systems are commonly assesed:

discrimination and calibration


Components of validation of rating systems


Components of validation of rating systems

Discrimination:

In checking discrimination, we ask: How well does a rating system rank borrowers according to their probability of default (PD)?

Calibration:

When examining calibration, we ask: How well do estimated PDs match true PDs?



Accrording to Basel Committee on Banking Supervision we can mention the following statistical methodologies for the assessment of discriminatory:

Cumulative Accuracy Profile (CAP),

Accuracy Ratio (AR),

Receiver Operating Characteristic (ROC),

ROC measure (AUC) approximated by Mann-Whitney statistic,

Pietra Index – approximated by Kolmogorov-Smirnov statistic,

Conditional entropy, Kullback-Leibler distance,

Conditional Information Entropy Ratio (CIER),

Information value (divergence, stability index),



Accrording to Basel Committee on Banking Supervision we can mention the following statistical methodologies for the assessment of discriminatory (continue):

Bayesian error rate,

Kendall’s τ and Somers’ D (for shadow ratings),

Brier Score.


Receiver Operating Characteristic (ROC)

actual

default nondefault

predictiondefault TP FP

nondefault FN TN

as a function of cutt-off



PD is_default avg_PDnondefault default cum_toal cum_nondef cum_def KS AUC AR

0 0 0,16 1 3 4 7% 4% 25% 21% 0,004 0,0090 0 0,9 7 4 18% 12% 50% 38% 0,036 0,050

0,1 0 0,8 5 2 25% 18% 63% 45% 0,069 0,0890,1 0 0,7 9 3 37% 29% 81% 53% 0,146 0,1760,1 0 0,6 10 2 49% 40% 94% 53% 0,250 0,2810,1 0 0,5 8 1 58% 50% 100% 50% 0,343 0,3680,1 0 0,4 7 0 65% 58% 100% 42% 0,426 0,4380,1 0 0,3 9 0 74% 69% 100% 31% 0,533 0,5280,1 0 0,2 13 0 87% 85% 100% 15% 0,688 0,6580,1 0 0,1 11 0 98% 98% 100% 2% 0,819 0,7680,1 0 0 2 0 100% 100% 100% 0% 0,843 0,7880,1 0 total 84 16 53,27% 84,26% 68,53%0,1 00,2 00,2 00,2 00,2 00,2 00,2 00,2 00,2 00,2 00,2 00,2 00,2 00,2 00,3 00,3 00,3 00,3 00,3 0


Receiver Operating Characteristic (ROC) (the first approach)

Matlab file: ROC.mCall: [xy,AUC]=ROC(ratings, defaults)input:

ratings – column vector of rarting value (higher value denote more risky) defaults – 1 denote default, 0 denote non-default

Output:xy- points collection to plot ROC,AUC- Area Under Curve

--------------------------------------------------------------------------------------------------------------Matlab file data example: capexample.mat

The area A is 0.5 for a random model without discriminative power and it is 1.0 for a perfect model. It is between 0.5 and 1.0 for any reasonable rating model in practice.


Receiver Operating Characteristic (ROC) (the second approach)

Matlab file: groc.mCall: [AUC,AUC_m]=groc(pd,is_default,spos,is_pic)

input: pd – column vector of rarting valueis_default– 1 denote default, 0 denote non-defaultspos – three aproachesis_pic – show curve

Output:AUC- Area Under Curve using empirical dataAUC_m - Area Under Curve using aproximation


ROC measure (AUC) approximated by Mann-Whitney statistic

Assumption:The scores of the defaulter and the non-defaulter can be interpreted as realisations of the two independent continuous random variables SD and SND

The area under the ROC curve (AUC) is equal to the probability that SD produces a smaller rating score than SND.

AUC=P(SD<SND)

This interpretation relates to the U-test of Mann-Whitney. We split default vector on two vector: defaulters and non-defaulters and calculate the test statistic Û of Mann-Whitney which is defined as

NDD

DNDNDD

NN

u

U

),(,

where Û is defined as

NDD

NDDNDD SSif

SSifu

,0

,1,


ROC measure (AUC) approximated by Mann-Whitney statistic

Matlab file: AUC_Whitney.mCall: [U]=AUC_Whitney(ratings, defaults)input:


Output: U - the test statistic Û of Mann-Whitney



Pietra Index – approximated by Kolmogorov-Smirnov statistic

It is possible to interpret the Pietra Index as the maximum difference between the cumulative frequency distributions of good and bad cases.

Interpreting the Pietra Index as the maximum difference between the cumulative frequency distributions for the score values of good and bad cases makes it possible to perform a statistical test for the differences between these distributions.This is the Kolmogorov-Smirnov Test (KS Test) for two independent samples.The null hypothesis tested is: The score distributions of good and bad casesare identical. It mean that when the null hypothesis wil be rejected so significant differences exist between the rating values of good and bad cases (discrimination).

FFbad

cum

good

cumindexPietra max


ROC curbe model

Y=normcdf( a + b * norminv(x) )


Pietra Index – approximated by Kolmogorov-Smirnov statistic

Matlab file: pietra.mCall: [pietra_index,H,confidence_level]=pietra(rating,is_default,q)input:

ratings – column vector of rarting value (higher value denote more risky) is_default – 1 denote default, 0 denote non-defaultq - significance levels (default q=0.05)

Output: pietra_index - asymptotic P-value H :

1-The null hypothesis was rejected (discrimination) 0- The null hypothesis wasn’t rejected

confidence_level – 1-q (default 95%)


Cumulative Accuracy Profile (CAP) and Accuracy Ratio (AR)

We present two ways of calculation of Cumullative Accuracy Profile and Accuracy Ratio: (1) the first approach: calculation based on sorted rating e.g. (AA,B...), (2) the second approach: calculation based on rating which is number without classifying to concrete well-known rating e.g. (AA,B...).

Characteristic of the first approach (discrete approach): calculation based on every rating so we have the same number of points on the curve as number of group of rating, advantage: this way is less risky because we avoid sorting problem disadvantage: the curve is less accurate (number of points=number of distinct ratings

Characteristic of the second approach (continuous approach): calculation based on every observations (so we have the same number of points on the curve as number of observation, advantage: the curve is more accurate than first approach disadvantage: this way is more risky because in the case the same value of rating can occur sorting problem


Cumulative Accuracy Profile (CAP) (the first approach)



The cumulative accuracy profile (CAP) provides a way of visualizing discriminatory power.

The key idea is the following: if a rating system discriminates well, defaults should occur mainly among borrowers with a bad rating.

To graph a CAP, we need historical data of ratings and default behavior. The example was presented on the below picture


Accuracy Ratio (AR) (the first approach)

An accuracy ratio (AR) condenses the information contained in CAP curves into a single number. It can be obtained by relating the area under the CAP but above the diagonal to the maximum area the CAP can enclose above the diagonal. Thus, the maximum accuracy ratio is 1.

We compute the accuracy ratio as A/B, where A is the area pertaining to the rating system under analysis, and B is the one pertaining to the ‘perfect’ rating system.



Matlab file: CAP.mCall: [xy,AR]=CAP(ratings, defaults)input:


Output:xy- points collection to plot CAP,AR- Accuracy Ratio



Cumulative Accuracy Profile (CAP) (the second approach)

Matlab file: cap_continuous.mCall: [AR]=cap_continuous(pd,is_def)input:

pd – column vector of rarting valueis_def– 1 denote default, 0 denote non-default

Output:AR- Accuracy Ratio


Conditional entropy, Kullback-Leibler distance

Consider as rating system that, applied to an obligor, produces a random score S. If D denotes the event “obligor defaults” and D denotes the complementary event “obligor does not default”, we can apply the information entropy H to the P(D |S), the conditional probability of default given the rating score S. The result of this operation can be considered a conditional information entropy of the default event,

and as such is a random variable whose expectation can be calculated. This expectation is called Conditional Entropy of the default event (with respect to the rating score S), and can formally be written as

The Conditional Entropy of the default event is at most as large as the unconditional Information Entropy of the default event, i.e.

For the empirical default rate :

))|(log)|()|(log)|(())|(( SDPSDPSDPSDPSDPH

)]|(log)|()|(log)|([))|(( SDPSDPSDPSDPESDPHH s

)](log)()(log)([))(( DPDPDPDPEDPHH p



An analytic tool that is closely related to the Cumulative Accuracy Profile is the Receiver Operating Characteristic (ROC). The ROC can be obtained by plotting the fraction of defaulters against the fraction of non-defaulters. The two graphs thus differ in the definition of the x-axis.

A summary statistic of a ROC analysis is the area under the ROC curve (AUC). Reflecting the fact that the CAP is very similar to the ROC, there is an exact linear relationship between the accuracy ratio and the area under the curve:

(AR)=2× AUC−1



H can be interpreted as a measure of chaotic character.The difference of H(p) and HS should be as large as possible because in this case the gain of information by application of the rating scores would be a maximum:

max)( sHpHHPS

For the normalization case: we get Conditional Information Entropy Ratio (CIER):

)(

)(

pH

HpHCIER s

s

The value of CIER will be the closer to one the more information about the default event is contained in the rating scores S.


Conditional Information Entropy Ratio (CIER)

Matlab file: CIER.mCall: [wsk, HPS]=cier(pd,is_default,spos)input:

pd– column vector of rarting value (higher value denote more risky) is_default – 1 denote default, 0 denote non-defaultspos – various approaches

Output: wsk-CIER normalised form of conditional entropy

HPS – unnormalised form of conditional entropy

Datamining-Solutions

Conditional Information Entropy Ratio (CIER)

The Kullback Leibler divergence can be coputed as follows:

s nondefsP

defsPdefsPnondefdefD

)/(

)/(log)/()( 2

This expression can be interpreted as a information divergence (information gain, relative entropy ) between a scoer density for the default and nondefault population.

Becouse of its unsymetrically character its more comfortably to use another form as a System Stability Index (SSI)

)()( defnondefDnondefdefDSSI

of course both of these vaues should be as much as it is possible



Matlab file: kullback_leibler.mCall: [D_DN, D_ND, SSI]=kullback_leibler(pd,is_default,spos)input:

pd– column vector of rarting value (higher value denote more risky) is_default – 1 denote default, 0 denote non-defaultspos – various approaches

Output: D_DN – distance from f(D) to f(ND) D_ND - distance from f(ND) to f(D) SSI - System Stability Index


Bayesian error rate

Denote with pD the rate of defaulters in the portfolio and define the hit rate HR and the false alarm rate FAR as above. In case of a concave ROC curve the Bayesian error rate then can be calculated via

As a consequence, the error rate is then equivalent to the Pietra Index and the Kolmogorov-Smirnov statistic.

))()1())(1((min CFARpCHRprateError DDc


Bayesian error rate

Matlab file: bayesian_error_rate.mCall: [ber]=bayesian_error_rate(rating,is_default,spos)input:

rating– column vector of rarting value (higher value denote more risky) is_default – 1 denote default, 0 denote non-defaultspos – various approaches

Output: ber - Bayesian error rate


Kendall’s τ and Somers’ D

Kendall’s τ and Somers’ D are so-called rank order statistics, and as such measure the degree of comonotonic dependence of two random variables. The notion of comonotonic dependence generalises linear dependence that is expressed via (linear) correlation. In particular, any pair of random variables with correlation 1 (i.e. any linearly dependent pair of random variables) is comonotonically dependent. But in addition, as soon as one of the variables can be expressed as any kind of increasing transformation of the other, the two variables are comonotonic. In the actuarial literature, comonotonic dependence is considered the strongest form of dependence of random variables.

Kendall noted that the number of concordances minus the number of discordances is compared to the total number of pairs, n(n-1)/2, this statistic is the Kendall's Tau a:

]2/)1([

nn

DCa



Kendall’s τ and Somers’ D are so-called rank order statistics, and as such measure the degree of comonotonic dependence of two random variables. The notion of comonotonic dependence generalises linear dependence that is expressed via (linear) correlation. In particular, any pair of random variables with correlation 1 (i.e. any linearly dependent pair of random variables) is comonotonically dependent. But in addition, as soon as one of the variables can be expressed as any kind of increasing transformation of the other, the two variables are comonotonic. In the actuarial literature, comonotonic dependence is considered the strongest form of dependence of random variables.

Kendall noted that the number of concordances minus the number of discordances is compared to the total number of pairs, n(n-1)/2, this statistic is the Kendall's Tau a:

and Sommers’D:

]2/)1([

nn

DCa

][SomersD

TDC

DC



Matlab file: tau_somersd.mCall: [Tau,SomersD,Tau_a,Gamma]=tau_somersd(pd, is_default)input:

pd– vector of estimated default from model is_default - vector of the real pd

Output: Tau –Tau calculated using Matlab function corr.m SomersD –value of SomersD Tau_a – Tau calculate is traditional way Gamma – the same value as SomersD in case there aren’t the same value

of pd (T)


Brier Score

The Brier score is a method for the evaluation of the quality of the forecast of a probability. It has its origins in the field of weather forecasts. But it is straightforward to apply this concept to rating models. The Brier Score is denifed as RMSE (root mean squared error)

N

PDdscoreBrier

N

iii

1

2)(

where i indexes the N observations, di is an indicator variable that takes the value 1 if borrower i defaulted (0 otherwise), and PDi is the estimated probability of default of borrower i.

the Brier score lies between 0 and 1,

better default probability forecasts are associated with lower score values.


Brier Score

Matlab file: Brier.mCall: [out]=Brier(PDs, defaults)input:

PDs – vector of estimated default from modeldefault - vector of the real pd

Output: out –Brier Score


Calibration methods


Calibration methods

Accrording to Basel Committee on Banking Supervision we can mention the following methodologies assessing the quality of the PD estimates:

Binomial test,

Normal test,

Normal test with asset correlation,

Traffic lights approach,

Chi-square test (Hosmer-Lemeshow ).


Binomial test

In many rating systems used by financial institutions, obligors are grouped into rating categories. The default probability of a rating category can then be estimated in different ways. Regardless of the way in which a default probability for a rating grade was estimated, we may want to test whether it is in line with observed default rates. From the perspective of risk management and supervisors, it is often crucial to detect whether default probability estimates are too low.

On the start we can assume that defaults are independent (so default correlation is zero). The number of defaults Dkt in a given year t and grade k then followsa binomial distribution. The number of trials is Nkt, the number of obligors in grade k at the start of the year t; the success probability is PDkt , the default probability estimated at the start of year t.


Binomial test

At a significance level of (e.g. α =1%), we can reject the hypothesis that the default probability is not underestimated if:

),,1(1 ktktkt PDNDBinom

where BINOM(x, N, q) denotes the binomial probability of observing x successes out of N trials with success probability q. If above condition is true, we need to assume an unlikely scenario to explain the actual default count Dkt (or a higher one). This would lead us to conclude that the PD has underestimated the true default probability.


Normal test

For large N, the binomial distribution converges to the normal, so we can also use a normal approximation to equation from the previous slide.

If defaults follow a binomial distribution with default probability PDkt, the default count Dkt has a standard deviation:

ktktkt NPDPD )1(

The default count’s mean is:

ktktNPD

Instead of equation using bonomial distribution we can now examine:

ktktkt

ktktkt

NPDPD

NPDD

)1(

)5.0(1

Where Φ denotes the cumulative standard normal distribution.If above condition is true, we need to assume an unlikely scenario to explain the actual default count Dkt (or a higher one). This would lead us to conclude that the PD has underestimated the true default probability.


Normal test with asset correlation

When we can assume that defaults are not independent (so default correlation is not zero) we have to introduce a asset correlation ρ.

Now we examine the following equation using asset correlation ρ:

)/(1)( 11ktktkt NDPD

Where - is inverse of the normal cumulative distribution function.

If above equation is true, we conclude that the PD estimate was too low with the asset correlation ρ.

1



Matlab file: Binomial.mCall: [ALLTest]=Binomial(PD_est,PD,N,alpha,p)input:

PD_est - Historically default rates e.g. for year 1990-2004, forevery rating categories (grade) -The default probability of a rating category

PD - probability of default of every grade ratingN - amount of all trials in every grade ratingalpha - significance level e.g. alpha=1%p - asset correlation e.g. p=0.07

Output: ALLTest – matrix of results

first column- binomialsecond column – normalthird column – normal with correlation p

-------------------------------------------------------------------------------------------------------------Matlab file data example: binDataExample.mat


Traffic lights approach

Decisions on significance levels are somewhat arbitrary. In a traffic lights approach, we choose two rather or more than one significance level.

If the p-value of a test is below red, we assign an observation to the red zone, meaning that an underestimation of the default probability is very likely. If the p-value is above red but below orange, we interpret the result as a very important warning that the PD might be an underestimate (orange zone). If the p-value is above orange but below yellow, we interpret the result as a warning that the PD might be an underestimate (yellow zone). Otherwise, we assign it to the green zone.

For example we assume the following significance level for trafficlight approach:red <=0.01 , orange (0.01,0.05>, yellow (0.05,0.07> , green >0.7



Matlab file: TrafficLight.mCall: [ALLTraficResult]=TrafficLight(PD_est,PD,N,p)input:

PD_est - Historically default rates e.g. for year 1990-2004, forevery rating categories (grade) -The default probability of a rating category

PD - probability of default of every grade ratingN - amount of all trials in every grade ratingp - asset correlation e.g. p=0.07

In the file exists significance level for trafficlight:red <=0.01 , yellow (0.01,0.05>, orange (0.05,0.07> , green >0.7

Output: ALLTraficResult– matrix of results

Number „4” denotes- red light, Number „3” denotes- orange light, Number „2” denotes- yellow light, Number „1” denotes - green lightfirst column - binomial, second column – normal, third column – normal with correlation ----------------------------------------------------------------------------------------------------

Matlab file data example: binDataExample.mat


Chi-square test (Hosmer-Lemeshow)

Let 0, , p … pK denote the forecasted default probabilities of debtors in the rating categories 0,1,…,k. Define the statistic

with ni = number of debtors with rating i and θi = number of defaulted debtors with rating i. By the central limit theorem, when ni → ∞ simultaneously for all i, the distribution of Tk will converge in distribution towards a χ2 k +1 -distribution if all the pi are the true default probabilities.

The p-value of a χ2 k +1 -test could serve as a measure of the accuracy of the estimated default probabilities: the closer the p-value is to zero, the worse the estimation is.

k

i iii

iiik ppn

pnT

0 )1(

)(


Chi-square test (Hosmer-Lemeshow)

Matlab file: hosmer_lemeshow.mCall: p=hosmer_lemeshow(pd,is_default)input:

pd – vector of estimated default from modelis_default- vector of the real pd

Output: The p-value of a χ2

k +1

datamining-solutions validation of rating system 1

Documents

calibration slide

characteristic roc slide

rating systems discrimination

validation of rating

components of validation

characteristic roc pdis

checking discrimination

rating system rank borrowers