an assessment of credit risk rating models in …mefmi.org › mefmifellows › wp-content ›...

Macroeconomic and Financial Management Institute of Eastern and Southern Africa

AN ASSESSMENT OF CREDIT RISK RATING MODELS IN

RELATION TO BASEL II PROVISIONS: A CASE FOR ZIMBABWE

Bob Takavingofa

Reserve Bank of Zimbabwe

January, 2012

A Technical Paper submitted in Partial Fulfilment of the Award of

MEFMI Fellowship

Page ii of 63

Abstract

Credit risk rating systems now play a central role in banking institutions across the globe to

the extent that their performance has a direct impact on profitability and soundness of the

institutions. This has prompted vast research into the field.

In this paper we assess the performance of credit rating systems in Zimbabwe and derive

generalizations for the MEFMI region. We use an in-depth study of two banking

institutions’ internal credit risk rating systems. Various credit risk validation tests that are

currently applied in industry are explored and applied. The study concludes that proper

credit risk rating system methodology and governance are essential for the continued

relevance of the credit assessment process.

Page iii of 63

TableofContents Abstract .................................................................................................................................. ii

List of Figures ....................................................................................................................... iv

List of Tables ......................................................................................................................... v

Acknowledgements ............................................................................................................... vi

Chapter 1: Introduction .......................................................................................................... 1

Chapter 2: Literature Review ................................................................................................. 6

Chapter 3: Methodology ...................................................................................................... 35

Chapter 4: Results ................................................................................................................ 45

Chapter 5: Conclusion.......................................................................................................... 52

Bibliography ........................................................................................................................ 55

Page iv of 63

ListofFigures Figure 1: Cumulative Accuracy Profiles .............................................................................. 19

Figure 2: Distribution of Defaulters and Non-defaulters ..................................................... 21

Figure 3: Plot of Internal vs External Rating System .......................................................... 33

Figure 4: Ratings Distribution by Class ............................................................................... 37

Figure 5: Ratings Distribution by Class ............................................................................... 39

Figure 6: CAP for Ratings Data from Bank 1...................................................................... 46

Figure 7: CAP for Ratings Data from Bank 2...................................................................... 50

Page v of 63

ListofTables Table 1: Transition Matrix ..................................................................................................... 9

Table 2: Rating System Validation Dimensions .................................................................. 10

Table 3: Crude Default Rates ............................................................................................... 37

Table 4: Client Quality Classification .................................................................................. 38

Table 5: Migration Matrix for Bank 2 Credit Portfolio ....................................................... 49

Table 6: Migration Matrix ................................................................................................... 45

Table 7: Results of the Binomial Test .................................................................................. 47

Table 8: Results of the One Factor Test ............................................................................... 48

Table 9: Binomial Test for Bank 2’s Rating System ........................................................... 51

Page vi of 63

Acknowledgements

I would like to extent my sincere thanks to my mentor, Dr. Mudavanhu for all the guidance in coming up with the technical paper.

I also thank Messrs Mataruka, Chirozva and Chiviri for all the encouragement through the

fellowship program.

The MEFMI secretariat for the funding of my customised training program, I thank you so

much. I appreciate Messrs Ngalande, Ncube, Kavuma and Namagoa, Mrs. Makamba and

the entire MEFMI family for believing in me and offering me the distinguished opportunity

to be part of the fellowship program.

I would also like to acknowledge and thank Professor Petersen for the guidance at the start

of this work.

To my wife, Samukile and daughter, Thandeka, I appreciate your endurance and support. I

love you guys.

Above all I thank God, the father of our Lord Jesus Christ who has done all this, Amen.

Page vii of 63

List of Acronyms and Abbreviations

AIGV Accord Implementation Group Validation

AIG Accord Implementation Group

AR Accuracy Ratio

AUC Area Under the Curve

BCBS Basel Committee on Banking Supervision

BIS Bank for International Settlements

CAP Cumulative Accuracy Profile

EAD Exposure at Default

EDF Expected Default Frequency

IRB Internal Rating Based

LGD Loss Given Default

MEFMI Macroeconomic and Financial Management Institute

PD Probability of Default

PDNU Probability of Default not Under Estimated

PDU Probability of Default Under Estimated

ROA Return on Assets

ROC Receiver Operating Characteristic

ROE return on Equity

TTC Through-the-Cycle

PIT Point-in-Time

of 63

Chapter1:Introduction

1.1 Background

One of the core functions of banking institutions is that of transforming short term

liabilities into long term assets for a premium that is commensurate to the risk inherent in

the created assets. This critical role of maturity transformation enables banks to allocate

capital resources from surplus units of the economy to deficit units, thus facilitating

economic activity. The business model of banking institutions, however, exposes them to

various risks that include credit risk. Credit risk is the risk of loss arising from a borrower

or counterparty not performing according to agreed terms.

The major challenge for banking institutions in their intermediary role is the determination

of credit worth borrowers whose economic activities will survive beyond the tenure of

outstanding loans. The insolvency of a banking institution’s borrowers can cause

devastating consequences as was witnessed during the global financial crisis that started in

the United States (2007-2009). Further, credit risk can induce the crystallisation of other

risks such as liquidity, reputational and earnings risk. Thus banks aim for high quality

borrowers when extending credit. Extending credit to credit unworthy borrowers may,

however, be used as a strategy for higher returns for as long as there are proper mitigation

strategies in place and the bank is aware of the level of credit risk inherent in the loan.

Banking institutions have thus developed tools to help them discriminate between high

quality borrowers and low quality borrowers. The tools now range from credit risk

scorecards to more complex rating systems. The credit risk models employed are now

being used for loan granting and risk pricing of the loans to ensure that banks are

compensated for the credit risk they assume. The tools are generally referred to as credit

risk rating systems. Colquitt (2007) asserts that internal credit risk rating systems have

become the cornerstone to managing a range of credit functions and now also serve as a

framework for management credit decisions in banking institutions.

The need for credit rating systems is underscored by the fact that usually more than 70% of

total balance sheet assets of a banking institution are loans, which have an implication on

of 63

exposure to credit risk. This was recognised in 1988 by the Basel Committee on Banking

Supervision (BCBS) when they released the first capital accord titled “International

Convergence of Capital Measurement and Capital Standards”, popularly known as Basel I,

which required internationally active commercial banking institutions to hold capital

commensurate to their credit risk profile. With the fast-paced evolution of financial markets

and complexity of banking risks that encouraged advances in banking institution’s risk

management systems, BCBS further revamped its capital computation standards in 2004.

The BCBS published the “Revised International Convergence of Capital Measurement and

Capital Standards Framework” (Basel II) which has increased reliance on banking

institution’s internal estimates of credit worthiness (see BCBS, 2004). This was an

appreciation by BCBS that banking institutions had developed internal credit risk rating

models that are efficient in forecasting borrower default in loan granting and monitoring.

The overarching objective of the revision was to try and mirror the complexity that banks

use when calculating their economic capital for credit risk in the computation of regulatory

capital. The Revised Capital Accord gives provisions under the advanced approaches for

banks, after meeting minimum qualitative and quantitative requirements prescribed in the

accord, to use their internal models to compute capital. For credit risk banks are allowed to

use the Internal Rating Based (IRB) approach, which has the Foundation Approach and the

Advanced Approach. Under the Foundation Internal Rating Based approach banks provide

their own estimate of the Probability of Default (PD) in the Supervisory formula used to

calculate regulatory capital for credit risk. It is thus critical to note that the accuracy of

generated ratings is of paramount importance for banking institutions operating under IRB

approaches as they have direct implication on the sufficiency of required regulatory capital

to cover unexpected losses. Informed by observations made during the global financial

crisis the BCBS re-emphasised the need for robust credit risk capital parameters, especially

PDs, in their Basel III publications.

Internal rating models used by banking institutions for credit granting and pricing,

however, expose banks to model risk. Model risk is the risk of erroneous results or

interpretations due to errors in the model or in the overall modelling process. Model risk is

cited as one of the chief causes of the subprime crisis in the United States as some experts

argue that financial institutions were using erroneous models due to the wrong assumptions

of 63

they were developed from. Experts further assert that the crisis was also compounded by

the lack of oversight over models risk by Supervisors as they did not understand some of

the models that banks were using. The threat of model risk materialising is mitigated by

sound validation techniques. The BCBS, in the context of rating systems, define the term

“validation” as encompassing a range of processes and activities that contribute to an

assessment of whether ratings adequately differentiate risk, and whether estimates of risk

components (such as PD, Loss Given Default (LGD), or Exposure at Default (EAD))

appropriately characterise the relevant aspects of risk. Validation of credit risk rating

systems is vital as it iteratively assesses and ensures that the systems perform according to

intended objectives. Credit risk validation provides a structured approach to assessing both

the discriminatory and calibration power of rating systems. Failure to run value adding

validation programs of rating systems may lead to under or over estimation of credit risks

inherent in loans. Under estimation leads to under pricing of loans which prejudices the

bank as it would not be appropriately compensated for the risks undertaken and may result

in the institution’s credit portfolio mainly composed of low quality borrowers. Over

estimation leads to over pricing of loans which may result in an exodus of high quality

borrowers to banks where they get lower pricing in relation to their risk profile.

Against this background, this paper provides the key issues involved in credit risk

validation of credit risk rating systems. Emphasis will be placed on understanding the

validation techniques.

1.2 Problem Statement

Jurisdictions in the MEFMI region are at various stages of implementing Basel II and III, in

order to accurately quantify their financial risks and determine the appropriate regulatory

capital requirements. Accordingly, banks in MEFMI member states have begun to devote

resources towards the development of internal credit risk rating systems in line with current

international best practice. These efforts have been recognized and encouraged by bank

regulators in the region as noted by their phased implementation of Basel II and III.

Zimbabwe has even gone further to modify the BCBS prescribed standardised credit risk

approach, and the Reserve Bank of Zimbabwe’s Guideline No.1-2011/BSD “Technical

of 63

Guidance on the Implementation of Basel II in Zimbabwe” makes it mandatory for every

banking institution operating in Zimbabwe to have an operational internal rating system.

The modification was done in view of the fact that implementing the Basel II credit risk

standardised approach in their raw form was effectively equivalent to maintaining Basel I

for the cooperate portfolio. This is so because there are no corporations that are rated by

credible external rating agencies except of banking institutions.

Some banking institutions in Zimbabwe, however, do not conduct regular validations for

their internal credit risk rating systems and decisions with some of the bank managers

revealed that some of the rating systems have not been validated since inception owing to

lack of skills and appreciation of the value brought by validation exercises. Therefore the

major challenge for both banks and regulators lies in the evaluation of the accuracy of the

model’s forecasts of credit losses.

The implementation of a new standard such as Basel II and III in any jurisdiction is always

a learning process for both supervisors and banks. It is therefore critical for supervisors to

be ahead of the market through developing skills in assessing and maintenance of rating

systems including their shortfalls so as to have oversight of model risk inherent in their

financial systems.

1.3 Objectives

From the foregoing, the following objectives are set:

(i) To explore sound credit risk rating systems methodologies and processes that are used

by banks globally.

(ii) To explore sound credit risk rating systems validation techniques as mitigants against

model risk.

(iii)To apply sound credit risk rating systems validation techniques in assessing the

sufficiency of credit risk rating models, in forecasting credit risk defaults, used by two

banks operating in Zimbabwe.

of 63

1.4 Significance of Study

Banking institutions are now using credit risk rating systems for credit granting and pricing

of loans. Further, the BCBS core principles of effective banking supervision principles 19-

21 require Supervisors; to have a Supervisory review process, which reviews all functional

areas of a banking institution. In this regard, it is imperative for supervisors to develop an

understanding and skills in the validation of rating systems. The study will focus on credit

risk rating systems validations.

The purpose of the study is to give awareness to both regulators and bankers on the

importance and value added by implementing sound iterative validation programs for rating

systems. This will be done using an in-depth case study of two Zimbabwean banking

institutions whose rating systems have been operational for many years

1.5 Limitations of the Study

The major limitation of the research is sufficiency and soundness of data including a lack

of appropriate benchmark rating systems. This will limit the range of validation techniques

that can be undertaken. Further, the MEFMI region is a collection of emerging markets that

are characterised by the absence of a secondary market for credit related securities such as

credit default swaps. Such instruments enable benchmarking of internal ratings to market

implied default rates. The spectrum of borrowers in some sectors of the economy in

Zimbabwe as well as other MEFMI countries is limited, thus limiting the number of

observations for some rating categories which may result in biased computed estimates

such as PDs. Zimbabwe like other MEFMI countries does not have credible external rating

agencies to benchmark with their internal obligor ratings. In addition some countries in the

MEFMI region do not have reliable audit statements for all operating firms. This introduces

some challenges for credit analyst and may result in an inconsistent rating methodology.

of 63

Chapter2:LiteratureReview

2.1 Introduction

Credit risk rating systems have gained prominence over the years as efficient tools that aide

banking institutions in discriminating borrowers according to their forecasted credit quality.

Adverse selection in credit extension, as was the case just before the global financial crisis,

does not only expose a bank to credit risk only but also to liquidity risk as default causes

balance sheet liquidity timing dislocations.

In this section we review the literature on credit rating systems and validation

methodologies. We discuss common credit risk assessment models used in practice to gain

understanding of their strength and weaknesses in line with our set objectives outlined in

chapter 1. We then discuss validation techniques and benchmarking of internal rating

systems before concluding.

2.2 Credit Risk Assessment Techniques

The main objective of credit assessment personnel and credit risk managers is the

prediction of default probability of obligors based on available information on the obligor

at the start and throughout the life of a credit relationship. Generally, borrowers have more

information about their credit quality at any given point of a credit relationship than their

lenders. It is in this regard that banks, in view of this information asymmetry, limit

borrowers’ access to the bank’s credit, rather than allowing borrowers to select the sizes of

their loans without restriction. The limit on the credit granted to borrowers must however

be based on the level of default likelihood of the borrower so as to avoid adverse selection.

Thus banking institution employ various approaches in designing credit rating systems that

discriminate between credit worth borrowers and defaulters. Some of the approaches

include the following techniques:

(a) Econometric Techniques

Econometric techniques such as linear and multiple discriminant analysis, multiple

regression, logit and probit analysis all can be used to model the probability of default of

of 63

obligors. The independent variables include financial ratios and other indicators as well as

external variables used to measure economic conditions. Survival analysis refers to a set of

techniques used to measure the time to response, failure, death, or the development of an

event. These models have the weakness of being data hungry in calibrating the model

parameters and results are usually dependant on the data set used.

(b) Neural Networks

These are computer-based systems that try to mimic the functioning of the human brain by

emulating a network of interconnected neurons (the smallest decision making units in the

brain). They use the same data employed in the econometric techniques but arrive at the

decision using alternative implementations of a trial and error method. However, the

disadvantages of neural network systems include:

(i) The time and effort required to translate the human experts’ decision processes

into a system of rules may be enormous.

(ii) The difficulty and costs associated with programming the decision algorithm and

maintaining the system.

(c) Optimization models

These mathematical programming techniques discover the optimum weights for obligor

and loan attributes that minimize lender selection error and maximize profits. Such systems

have disadvantages which include difficulty in obtaining closed form solutions and at times

are difficult to implement.

(d) Rule-based or expert systems

These mimic in a structured way the process used by an experienced analyst to arrive at the

credit decision. As the name indicates, such a system tries to clone the process used by a

successful analyst so that this expertise is available to the rest of the organization. Rule-

based systems are characterized by a set of decision rules, a knowledge-base consisting of

data such as industry financial ratios, and a structured inquiry process to be used by the

analyst in obtaining data on a particular borrower.

of 63

Although many banking institutions still use expert systems as part of their credit decision

process, these systems have two main shortfalls:

(i) Consistency. In assessing inherent credit risk there are various evaluation parameters

and factors that can be used. This then creates a problem of consistency and thus two

expert systems that are based on different assessment parameters are incomparable.

(ii) Subjectivity. The weights applied to the factors and parameters are subjective as they

are based on the expert assessing the borrower and can vary from borrower to

borrower making comparability of rankings very difficult.

(e) Hybrid Systems

These systems use direct computation, estimation and simulation. These are partly driven

by a direct causal relationship, the parameters of which are determined through estimation

techniques. An example of this is the KMV model, which uses an option theoretic

formulation to explain default and then derives the form of the relationship through

estimation. Some of the systems have the disadvantage of being overly complex and

difficult to implement such as the KMV model which requires advanced expertise in

quantitative techniques to implement.

After an initial rating has been assigned to a borrower at the start of a credit relationship

continual monitoring of default likelihood should be done at least annually or as and when

significant information that affect the credit risk profile of the bank is received. The

likelihood of changes migration or transition across credit risk grades can be assessed

overtime using transition or migration matrices.

2.2.1 Transition Matrices

Migration probability matrices are data summaries that help to predict the tendency of a

credit to migrate to lower or higher credit quality based on historically observed migration

patterns. These matrices are derived by using the cohort component analysis, i.e. observing

a group of similar credit quality borrowers through time from inception to end of a credit

relationship. Thus transition matrices help to answer questions such as “with what

probability will the credit risk rating of a borrower decrease by a given degree?’ Consider a

rating system with two rating classes A and B, and a default category D. The transition

of 63

matrix for this rating system is a table listing the probabilities that a borrower rated A at the

start of a period has rating A, B or D at the end of the period; analogously for B-rated

companies. The table below illustrates the transition matrix for this simple rating system:

Table 1: Transition Matrix

The length of period of observing transitions in credits is often set to one year, but other

choices are possible, such as three months. The default category does not have a row of its

own as it is treated as an absorbing category, i.e. probabilities of migrating from D to A and

B are set to zero. A borrower that moves from B to D and back to B within the period will

still be counted as a defaulter. If we counted such an instance as ‘stay within B’, the

transition matrix would understate the danger of experiencing losses from default. The

stability of the transition matrix is dependent on the rating philosophy of the bank.

2.2.2 Rating Philosophy

The first step when developing a conceptually sound credit risk rating framework is to

decide what the credit rating should indicate, thus the rating philosophy. It is very

important for banks to decide whether they want their internal rating systems to grade

borrowers according to their current condition (point-in-time, PIT) or their expected

condition over a cycle (through-the-cycle, TTC) because the rating philosophy influences

many aspects such as: credit approval, loan pricing, early warning of defaults, volatility and

procyclicality of regulatory and economic capital, and as a result the profitability of a bank

and its competitive position.

Banks whose ratings are used primarily for underwriting purposes are likely to implement

systems that are TTC. TTC ratings will tend to remain more-or-less constant as

macroeconomic conditions change over time. On the other hand, banks whose ratings are

A B D

AProbability of staying in A

Probability of migrating from A to B

Probability of default from A

BProbability of migrating from B to A

Probability of staying in B

Probability of default from B

Rating at Beginning of

Period

Rating at end of period

of 63

used for pricing purposes or to track current portfolio risk are more likely to implement PIT

rating systems. PIT ratings will tend to adjust quickly to a changing economic environment.

Between these two extreme cases lie hybrid rating systems that embody characteristics of

both PIT and TTC rating philosophies. To effectively validate pooled PDs, supervisors and

risk managers will need to understand the rating philosophy applied by a bank in assigning

obligors to risk buckets.

2.3 Validation Techniques

The objective of validating a rating system is to assess whether a rating system can, and

ultimately does, fulfil its task of accurately distinguishing and measuring credit risk. There

are two dimensions along which ratings are commonly assessed, that is, discrimination and

calibration.

In checking discrimination, we assess how well a rating system ranks borrowers according

to their true probability of default (PD). When examining calibration we assess how well

the estimated PDs match realised PDs. The following example in Table 1 below illustrates

the two dimensions of rating quality assessment.

Table 2: Rating System Validation Dimensions

B1 A (1%) A2(2.01%) 1.50%

B2 B (5%) B2(2%) 2%

B3 C (20%) C2(1.99%) 2.50%

BorrowerRating of System 2 (PD)

Actual PDRating of System 1 (PD)

of 63

From the table above we note that the rank ordering of rating system 1 is perfect, but the

PDs differ significantly from the true PDs. By contrast, the average PD of rating system 2

closely matches the average true PD, and individual deviations from the average PD are

small. However, it does not discriminate at all as the system’s PDs are inversely related to

the true PDs. There are various techniques that are used in practice to assess either the

discrimination or calibration power or both dimensions simultaneously for a rating system.

2.3.1 Provisions by BCBS

The advent of Basel II in 2004, linked credit risk regulatory capital requirements of banks

to banking institutions’ own internal estimates of credit risk parameters (PD, EAD and

LGD). The success of the revised accord’s credit risk advanced approaches was thus,

hinged on the continued accuracy and consistency of estimated parameters. To this end the

BCBS established the Subgroup on Validation called the Accord Implementation Group on

Validation (AIGV) in 2004. The objective of the AIGV is to share and exchange views

related to the validation of rating systems. This is aimed at ensuring that banks implement

robust and efficient validation mechanisms that enable the continued soundness of the

rating systems. In this research we will, however, only discuss validation of PDs. The

AIGV developed six important principles on validation that resulted in a broad framework

for validation. The principles were published in a BCBS news letter of January 2005 (see

BCBS, 2005, Blockwitz and Hohl, 2006). The validation framework covers all aspects of

validation, including the goal of validation (principle 1), the responsibility for validation

(principle 2), expectations on validation techniques (principles 3, 4, and 5), and the control

environment for validation (principle 6). The validation principles are outlined below.

Principle 1: Validation is fundamentally about assessing the predictive ability of a bank’s

risk estimates and the use of ratings in credit processes

of 63

The two step process for rating systems requires banks to firstly discriminate adequately

between risky borrowers (i.e. being able to discriminate between risks and its associated

risk of loss) and secondly calibrate risk (i.e. being able to accurately quantify the level of

risk). The IRB parameters must, as always with statistical estimates, be based on historical

experience which should form the basis for the forward-looking quality of the IRB

parameters. IRB validation should encompass the processes for assigning those estimates

including the governance and control procedures in a bank.

Principle 2: The bank has primary responsibility for validation

Supervisors do not have the primary responsibility for validating bank rating systems.

Rather, a bank has the primary role, and consequently must validate its own rating systems

to demonstrate how it arrived at its risk estimates and confirm that its processes for

assigning risk estimates are likely to work as intended and continue to perform as expected.

Supervisors, on the other hand, should review the bank’s validation processes and

outcomes and may rely upon additional processes of its own design, or even those of third

parties, in order to have the required level of supervisory comfort or assurance.

Principle 3: Validation is an iterative process

Validation is an ongoing, iterative process in which banks and supervisors periodically

refine validation tools in response to changing market and operating conditions. Banks and

supervisors will need to engage in an iterative dialogue on strengths and weaknesses of

particular rating systems.

Principle 4: There is no single validation method

Many well-known validation tools like backtesting, benchmarking, replication, etc are a

useful supplement to the overall goal of achieving a sound IRB system. However, there is

unanimous agreement that there is no universal tool available, which could be used across

portfolios and across markets.

Principle 5: Validation should encompass both quantitative and qualitative elements

of 63

While it might be possible to think of validation as a purely technical/mathematical

exercise in which outcomes are compared to estimates using statistical techniques, and

indeed in some circumstances such technical tools may play a critical role in such

assessments, it will likely be insufficient to focus solely on comparing predictions and

outcomes. In assessing the overall performance of a rating system, it is also important to

assess the components of the rating system (data, models, etc.) as well as the structures and

processes around the rating system. This should include an assessment of controls

(including independence), documentation, internal use, and other relevant qualitative

factors.

Principle 6: Validation processes and outcomes should be subject to independent review

It is important that a bank’s validation processes and results should be reviewed for

integrity by parties within the banking organisation that are independent of those

accountable for the design and implementation of the validation process. This independent

review is a process that may be accomplished using a variety of structural forms. The

activities of the review process may be distributed across multiple units or housed within

one unit, depending on the varying management and oversight frameworks of banks. As an

example, internal audit could be charged with undertaking this review process using

internal technical experts or third parties independent from those responsible for building

and validating the bank's rating system. Regardless of the bank's control structure, internal

audit has an oversight responsibility to ensure that validation processes are implemented as

designed and are effective.

According to the BCBS elaboration on the term “validation”, we consider three mutually

supporting ways to validate bank internal rating systems. This encompasses a range of

processes and activities that contribute to the overall assessment and final judgement. More

specifically, this can be directly related to the application of principle 4 and principle 5 of

the BCBS newsletter as discussed above.

Component-based validation: - analyses each of the three elements – data collection and

compilation, quantitative procedure and human influence – for appropriateness and

workability.

of 63

Result-based validation (also known as backtesting): - analyses the rating system’s

quantification of credit risk ex post.

Process-based validation: - analyses the rating system’s interfaces with other processes in

the bank and how the rating system is integrated into the bank’s overall management

structure.

(a) Process-based Validation

Validating rating processes includes analysing the extent to which an internal rating system

is used in daily banking business. The use test and associated risk estimates is one of the

key requirements in the Basel II framework. There are two different levels of validation.

Firstly, the plausibility of the actual rating in itself, and secondly, the integration of ratings

output in the operational procedure and interaction with other processes, this incorporates

the following:

(i) Understanding the rating system: It is fundamental to both types of analysis that

employees understand the rating methodology used. The learning process should

not be restricted to loan officers. As mentioned above, it should also include those

employees who are involved in the rating process. In-house training courses and

other training measures are required to ensure that the process operates properly.

(ii) Importance for management: Adequate corporate governance is crucial for banks.

In the case of a rating system, this requires the responsibility of executive

management and to a certain extent the supervisory board, for authorising the

rating methods and their implementation in the bank’s day-to-day business. We

would expect different rating methods to be used depending on the size of the

borrower, and taking account of the borrowers’ different risk content and the

relevance of the incoming information following the decision by senior

management.

(iii) Internal monitoring processes: The monitoring process must cover at least the

extent and the type of rating system used. In particular, it should be possible to rate

all borrowers in the system, with the final rating allocated before credit is granted.

If the rating is given after credit has been granted, this raises doubts about the

of 63

usefulness of internal rating. The same applies to a rating which is not subject to a

regular check. There should be a check at least annually and whenever new

information about the debtor is received which casts doubt on their ability to clear

their debts. The stability of the rating method over time, balanced with the need to

update the method as appropriate, is a key part of the validation. To do this, it is

necessary to show that objective criteria are incorporated so as to lay down the

conditions for a re-estimation of the quantitative rating model or to determine

whether a new rating model should be established.

(iv) Integration in the bank’s financial management structure: Unless meaningful credit

risk data is recorded for each borrower, it is impossible to perform the proper risk

pricing of loans taking into account inherent credit risk. If this is to be part of the

loan pricing process, a relationship must be determined between the individual

rating categories and the standard risk costs. However, it must be borne in mind

that the probability of default is simply a component of the calculation of the

standard risk costs and, similarly to the credit risk models, other risk parameters,

such as the LGD and EAD should also be recorded. Ultimately the gross margin on

a loan, which approximates to the difference between lending rates and refinancing

costs, can act as a yardstick for including the standard risk costs.

(b) Result-based Validation

As we have stated before that the major use of rating systems is that of forecasting

likelihood of default based on currently available information. Rating systems such as the

ones discussed above may be seen as classification tools in the sense that they provide

indications of the obligor’s likely future status. The procedure of applying a classification

tool to an obligor for an assessment of her or his future status is commonly called

discrimination.

The main construction principle of rating systems can be described as “the better a grade,

the smaller the proportion of defaulters and the greater the proportion of non-defaulters that

are assigned this grade.” Consequently, a rating system will discriminate the better, the

more the defaulters’ distribution on the grades and the non-defaulters’ distribution on the

grades differ.

of 63

The discriminatory power of a rating system, thus denotes its ability to discriminate ex ante

between defaulting and non-defaulting borrowers. The discriminatory power can be

assessed using a number of statistical measures of discrimination, some of which are

described in this section. However, the absolute measure of the discriminatory power of a

rating system is only of limited meaningfulness. A direct comparison of different rating

systems, for example, can only be performed if statistical “noise” is taken into account. In

general, the noise will be a greater issue, the smaller the size of the available default

sample. For this reason, statistical tools for the comparison of rating systems are also

presented. Some of the tools, in particular the Accuracy Ratio and the Receiver Operating

Characteristic, explicitly take into account the size of the default sample. Moreover, the

discriminatory power should be tested not only in the development dataset but also in an

independent dataset (out-of-sample validation). Otherwise there is a danger that the

discriminatory power may be overstated by over-fitting to the development dataset. In this

case the rating system will frequently exhibit a relatively low discriminatory power on

datasets that are independent of, but structurally similar to, the development dataset. Hence

the rating system would have a low stability. A characteristic feature of a stable rating

system is that it adequately models the causal relation between risk factors and

creditworthiness. It avoids spurious dependencies derived from empirical correlations. In

contrast to stable systems, unstable systems frequently show a sharply declining level of

forecasting accuracy over time.

In practice, rating systems are just used in credit granting decisions in some developing

economies with a high number of overrides. In some of the developing countries as well as

developed economies they form the basis for pricing credits and calculating risk premiums

and capital charges. For these purposes, each rating grade or score value must be associated

with a PD that gives a quantitative assessment of the likelihood with which obligors graded

this way will default. Additionally, under both IRB approaches and the Reserve Bank of

Zimbabwe credit risk modified approach, a bank’s capital requirements are determined by

internal estimates of the risk parameters for each exposure. These are derived in turn from

the bank’s internal rating scores. The set of parameters includes the borrower’s credit risk

grade and Probability of Default.

The other dimension in the validation of rating systems is to validate the calibration of the

of 63

rating system. As the risk parameters can be determined by the bank itself, the quality of

the calibration is an important prudential criterion for assessing rating systems.

Checking discriminatory power and checking calibration are different tasks. As the ability

of discrimination depends on the difference of the defaulters’ and non-defaulters’

respective distributions on the rating grades, some measures of discriminatory power

summarise the differences of the probability densities of these distributions. Alternatively,

the variation of the default probabilities that are assigned to the grades can be measured. In

contrast, correct calibration of a rating system means that the PD estimates are accurate.

Hence, when examining calibration the differences of forecast PDs and realised default

rates must be considered. This can be done simultaneously for all rating grades in a joint

test or separately for each rating grade, depending on whether an overall assessment or an

in-detail examination is intended.

At first glance, the validation of calibration power of rating systems appears to be a similar

problem to the back-testing of internal models for market risk. For market risk, add-ons to

the capital requirements can be immediately derived from the results of the back-testing

procedure. Past years’ experience gives evidence of the adequacy of this methodology.

However, the availability of historical data for market risk is much better than for credit

risk. As market risk has to be measured on a daily basis (leading to samples of size 250 in a

year), the (typically yearly) time intervals for credit risk are much longer due to the

rareness of credit events. For credit portfolios, ten data points of yearly default rates are

regarded as a long time series and the current Basel II proposals consider five year series as

sufficient. As a result, the reliability of estimates for credit risk parameters is not at all

comparable to the reliability of back-tests of internal market risk models.

Moreover, whereas in the case of market risk there is no strong evidence that the

assumption of independence (over time) observations would be violated, the analogous

assumption seems to be questionable for credit losses. This holds even more for cross-

sectional credit events (i.e. in the same year). As a consequence, standard independence-

based tests of discriminatory power and calibration are likely to be biased when applied to

credit portfolios.

of 63

The choice of a specific technique to be applied for validation should depend upon the

nature of the portfolio under consideration. Retail portfolios or portfolios of small- and

medium-sized enterprises with large records of default data are much easier to explore with

statistical methods than, for example, portfolios of sovereigns or financial institutions

where default data are sparse.

2.3.2 Assessing Discriminatory Power

There are various statistical methodologies for the assessment of discriminatory power (see

BCBS, 2005 and Tasche, 2009).

(a) Cumulative Accuracy Profile (CAP)

The Cumulative Accuracy Profile is also known as the Gini curve, Power curve or Lorenz

curve. It is a visual tool whose graph can easily be drawn if two representative samples of

scores for defaulted and non-defaulted borrowers are available. Concavity of the CAP is

equivalent to the property that the conditional probabilities of default given the underlying

scores form a decreasing function of the scores. Moreover, non-concavity indicates sub-

optimal use of information in the specification of the score function. The most common

summary index of the CAP is the Accuracy Ratio (or Gini coefficient). The shape of the

CAP depends on the proportion of defaulted and non-defaulted borrowers in the sample.

Hence a visual comparison of CAPs across different portfolios may be misleading.

Practical experience shows that the Accuracy Ratio (AR) has tendency to take values in the

range of 50% and 80%. However, such observations should be interpreted with care as they

seem to strongly depend on the composition of the portfolio and the numbers of defaulters

in the samples. Suppose we have an arbitrary rating model that produces obligor rating

scores. A high rating score is usually an indicator of a low default probability. To obtain

the CAP curve, all obligors are first ordered by their respective scores, from riskiest to

safest, i.e. from the borrower with the lowest score to the obligor with the highest score.

For a given fraction x of the total number of obligors the CAP curve is constructed by

calculating the percentage d(x) of the defaulters whose rating scores are equal to or lower

than the maximum score of fraction x. This is done for x ranging from 0% to 100%. Figure

1 illustrates CAP curves.

of 63

Figure 1: Cumulative Accuracy Profiles (Source BCBS Validation Group)

A perfect rating model will assign the lowest scores to the defaulters. For a random model

without any discriminative power, the fraction x of all obligors with the lowest rating

scores will contain x percent of all defaulters. Real rating systems will be somewhere in

between these two extremes. The quality of a rating system is measured by the Accuracy

Ratio AR. It is defined as the ratio of the area between the CAP of the rating model

being validated and the CAP of the random model, and the area between the CAP of the

perfect rating model and the CAP of the random model, given mathematically as:

(b) Receiver Operating Characteristic (ROC)

Like the CAP, the Receiver Operating Characteristic (ROC) is a visual tool that can be

easily constructed if two representative samples of scores for defaulted and non-defaulted

borrowers are available. The construction is slightly more complex than for CAPs but, in

contrast, does not require the sample composition to reflect the true proportion of defaulters

and non-defaulters. As with the CAP, concavity of the ROC is equivalent to the conditional

probabilities of default being a decreasing function of the underlying scores or ratings and

non-concavity indicates sub-optimal use of information in the specification of the score

of 63

function (see Eglemann et al, 2003). One of the summary indices of ROC, the ROC

measure (or Area Under the Curve, AUC), is a linear transformation of the Accuracy Ratio

mentioned above. The statistical properties of the ROC measure are well-known as it

coincides with the Mann-Whitney statistic. In particular, powerful tests are available for

comparing the ROC measure of a rating system with that of a random rating and for

comparing two or more rating systems. Also, confidence intervals for the ROC measure

can be estimated with readily available statistical software packages. By inspection of the

formulas for the intervals, it turns out that the widths of the confidence intervals are mainly

driven by the number of defaulters in the sample. The more defaulters are recorded in the

sample, the narrower the interval.

The Pietra index is another important summary index of ROCs. Whereas the AUC

measures the area under the ROC, the Pietra index reflects half the maximal distance of the

ROC and the diagonal in the unit square (which is just the ROC of rating systems without

any discriminatory power). As is the case with the ROC measure, the Pietra index also has

an interpretation in terms of a well-known test statistic, the Kolmogorov-Smirnov statistic.

As with the ROC measure, a test for checking the dissimilarity of a rating and the random

rating is included in almost all standard statistical software packages.

Both the ROC measure and the Pietra index do not depend on the total portfolio probability

of default. Therefore, they may be estimated on samples with non-representative

default/non-default proportions. Similarly, figures for bank portfolios with different

fractions of defaulters may be directly compared.

For both these indices, it is not possible to define in a meaningful way a general minimum

value in order to decide if a rating system has enough discriminatory power. However, both

indices are still useful indicators for the quality of a rating system.

Significance of rejection of the null hypothesis (rating system has no more power than the

random rating) with the Mann-Whitney or Kolmogorov-Smirnov tests at a (say) 5% level

could serve as a minimum requirement for rating systems. This would take care of

statistical aspects like sample size. Lower p-values with these tests are indicators of

superior discriminatory power. However, for most rating systems used in the banking

of 63

industry, p-values will be nearly indistinguishable from zero. As a consequence, the

applicability of the p-value as an indicator of rating quality appears to be limited.

The construction of a ROC curve is illustrated in the Figure below which shows possible

distributions of rating scores for defaulting and non-defaulting obligors. For a perfect rating

model the left distribution and the right distribution in the Figure would be separate. For

real rating systems, perfect discrimination in general is not possible. Both distributions will

overlap as illustrated below.

Figure 2: Distribution of Defaulters and Non-defaulters

(Source BCBS Validation Group)

Assume someone has to find out from the rating scores which obligors will survive during

the next period and which obligors will default. One possibility for the decision-maker

would be to introduce a cut-off value C as in Figure 2, and to classify each debtor with a

rating score lower than C as a potential defaulter and each debtor with a rating score higher

than C as a non-defaulter. Then four decision results would be possible. If the rating score

is below the cut-off value C and the debtor defaults subsequently, the decision was correct.

Otherwise the decision-maker wrongly classified a non-defaulter as a defaulter. If the rating

score is above the cut-off value and the debtor does not default, the classification was

correct. Otherwise a defaulter was incorrectly assigned to the non-defaulters’ group.

Rauhmeier, 2005 and Engelmann et al, 2003, show that the AUC and the AR both possess

the same information. Thus we shall focus only on one of the methodologies which is the

CAP.

of 63

(c) Entropy Measures

Entropy is a concept from information theory that is related to the extent of uncertainty that

is eliminated by an experiment. The observation of an obligor over time in order to decide

about her or his default status may be interpreted as such an experiment. The uncertainty of

the default status is highest if the applied rating system has not at all any discriminatory

power or, equivalently, all the rating grades have the same PD. In this situation, the entropy

concept applied to the PDs of the rating system would yield high figures since the gain in

information by finally observing the obligor’s status would be large. Minimisation of

entropy measures like Conditional Entropy, Kullback-Leibler distance, and information

value is therefore a widespread criterion for constructing rating systems or score functions

with high discriminatory power. However, these measures appear to be of limited use only

for validation purposes as no generally applicable statistical tests for comparisons are

available.

The Brier score is a sample estimator of the mean squared difference of the default

indicator variables (i.e. one in case of default and zero in case of survival) in a portfolio and

the default probability forecasts for rating categories or score values. In particular, the Brier

score does not directly measure the difference of the default probability forecast and the

true conditional probability of default given the scores. Therefore, the Brier score is not a

measure of calibration accuracy. Rather, the Brier score should be interpreted as the

residual sum of squares that result from a non-linear regression of the default indicators on

the rating or score function. As a consequence, minimising the Brier score is equivalent to

maximising the variance of the default probability forecasts (weighted with the frequencies

of the rating categories). Empirical results indicate that maximising the ROC measure

entails maximisation of this variance. In this sense, the Brier score is a measure of

discriminatory power and could be used in this sense as a part of an optimisation criterion.

The BCBS Validation Group exclaimed that the Accuracy Ratio (AR) and the ROC

measure are more meaningful than the other above-mentioned indices because of their

statistical properties. For both summary statistics, it is possible to calculate confidence

intervals in a simple way. The width of the confidence will depend on the particular

of 63

portfolio under consideration and on the number of defaulted obligors that is available for

the purpose of estimation. As a general rule, the width of the confidence interval for AR (or

the ROC measure) will be the larger, and hence the quality of the estimate will be the

worse, the smaller is the number of observed defaults. Consequently, these tools reflect

both the quality of a rating system and the size of the samples that the rating system is built

on. Therefore, they are helpful in identifying rating systems which require closer inspection

by a supervisor. In particular, supervisors can reliably test if a rating model is significantly

different from a model with no discriminatory power. The Brier score can be useful in the

process of developing a rating system as it also indicates which of any two rating systems

has the higher discriminatory power. However, due to the lack of statistical test procedures

applicable to the Brier score, the usefulness of this metric for validation purposes is limited.

If not enough default observations for the development of a rating or score system are

available, the construction of a shadow rating system could be considered. A shadow rating

is intended to duplicate an external rating but can be applied to obligors for which the

external rating is not available. Shadow ratings can be built when the available database

contains accounting information of enough externally rated obligors. Default probabilities

for the shadow rating will then be derived from statistics for the external rating. On

samples of borrowers for which both the shadow and the external rating are available, the

degree of concordance of the two rating systems can be measured with two rank-order

statistics, Kendall’s τ and Somers’ D. Somers’ D is a conditional version of Kendall’s τ that

coincides with the Accuracy Ratio in the case of a rating system with only two categories.

For both these metrics, tests can be performed and confidence intervals can be calculated

with some standard statistical software packages. In the case of high concordance of the

shadow rating and the external rating, the shadow rating will inherit the discriminatory

power and the calibration quality of the external rating if the portfolio under consideration

and the rating agency’s portfolio have a similar structure.

2.3.3 Calibration

Validation of the calibration of a rating system is more difficult than validation of its

discriminatory power. When considering the statistical tools it is important to note that

of 63

there are several established statistical methods for deriving PDs (Probabilities of Default)

from a rating system. First, a distinction needs to be drawn between direct and indirect

methods. In the case of the direct methods, such as Logit, Probit and Hazard Rate models,

the rating score itself can be taken as the borrower’s PD. The PD of a given rating grade is

then normally calculated as the mean of the PDs of the individual borrowers assigned to

each grade. Where the rating score cannot be taken as the PD (as in the case of discriminant

analysis), indirect methods can be used. One simple method consists of estimating the PD

for each rating grade from historical default rates. Another method is the estimation of the

score distributions of defaulting borrowers, on the one hand, and non-defaulting borrowers,

on the other. A specific PD can subsequently be assigned to each borrower using Bayes’

Formula.

In practice, a bank’s PD estimates will differ from the default rates that are observed

afterwards. The key question is whether the deviations are purely random or whether they

occur systematically. A systematic underestimation of PDs merits a critical assessment,

from the point of view of supervisors and bankers alike, since in this case the bank’s

computed capital requirement would not be adequate to the risk it has incurred.

When independence of default events is assumed in a homogeneous portfolio, the binomial

test (most powerful among all tests at fixed level) can be applied in order to test the

correctness of a one period default probability forecast. It is known in literature that the

true type I error (i.e. the probability to reject erroneously the hypothesis of an adequate PD

forecast) can be much larger than the nominal level of the test if default events are

correlated. Efforts to take into account dependence in the binomial test, for example, by

incorporating a one factor dependence structure and Gordy’s granularity adjustment in

order to adjust for the finiteness of the sample, yield tests of rather moderate power, even

for low levels of correlation. The binomial test can be applied to one rating category at a

time only. If (say) twenty categories are tested, at 5% significance level one erroneous

rejection of the null hypothesis “correct forecast” has to be expected. This problem can be

circumvented by applying the chi-square (or Hosmer-Lemeshow) test to check several

rating categories simultaneously. This test is based on the assumption of independence and

a normal approximation. Due to the dependence of default events that is observed in

practice and the generally low frequency of default events, the chi-square test is also likely

of 63

to underestimate the true Type I error.

The normal test is an approach to deal with the dependence problem that occurs in the case

of the binomial and chi-square tests. The normal test is a multi-period test of correctness of

a default probability forecast for a single rating category. It is applied under the assumption

that the mean default rate does not vary too much over time and that default events in

different years are independent. The normal test is motivated by the Central Limit Theorem

and is based on a normal approximation of the distribution of the time-averaged default

rates. Cross-sectional dependence is admissible. Simulation studies show that the quality of

the normal approximation is moderate but exhibits a conservative bias. As a consequence,

the true Type I error tends to be lower than the nominal level of the test, i.e. the proportion

of erroneous rejections of PD forecasts will be smaller than might be expected from the

formal confidence level of the test. The test seems even to be, to a certain degree, robust

against a violation of the assumption that defaults are independent over time. However, the

power of the test is moderate, in particular for short time series (for example five years).

For the supervisory evaluation of internal market risk models, the so-called traffic lights

approach has proved to be a valuable instrument. This approach was introduced with the

1996 Marker Risk Amendment. Banks use their internal market risk models in order to

forecast a certain amount of losses (Value-at-Risk) that will not be exceeded by the realised

losses with a high probability of 99%. Depending on the number of observed exceedances,

the so-called multiplication factor that is applied to the Value-at-Risk estimate is increased.

There is a green zone of exceedances where no increment to the multiplication factor is

necessary. In the yellow zone, the increment is effectively proportional to the number of

exceedances, whereas in the red zone the maximum value for the increment has to be

applied.

The concept of a traffic lights approach can be transferred to the validation of PD

estimates. However, it is unlikely that direct consequences for the capital requirements of a

bank can be derived from this approach. A recently proposed version of a traffic lights

approach is, in contrast to the normal test, completely independent of any assumption of

constant or nearly constant PDs over time. It can be considered as a multi-period back-

testing tool for a single rating category that is based on the assumption of cross-sectional

of 63

and inter-temporal independence of default events. The distribution of the number of

defaults in one year is approximated with a normal distribution. Based on the quartiles of

this normal distribution, the number of defaults is mapped to one of the four traffic light

colours: green, yellow, orange, and red. This mapping results in a multinomial distribution

of the numbers of colours when observed over time. Inference on the adequacy of default

probability forecasts this way becomes feasible. By construction of the tool with a normal

approximation that neglects potential cross-sectional and inter-temporal correlations, higher

than expected frequencies of type I errors (i.e. erroneous rejections of default probability

forecasts) may occur. As a consequence, this traffic lights approach is conservative in the

sense of yielding relatively more false alerts than not detecting bad calibrations. Simulation

results indicate that the traffic lights approach is not too conservative since the frequency of

false alerts can be kept under control. Furthermore the simulation study suggests that the

type II errors (i.e. the probabilities of accepting biased estimates as correct) are not higher

than those of the normal test.

It is worth noting that at present no really powerful tests of adequate calibration are

currently available. Due to the correlation effects that have to be respected there even

seems to be no way to develop such tests. Existing tests are rather conservative – such as

the binomial test and the chi-square test – or will only detect the most obvious cases of

miscalibration as in the case of the normal test. As long as validation of default

probabilities per rating category is required, the traffic lights testing procedure appears to

be a promising tool because it can be applied in nearly every situation that might occur in

practice. Nevertheless, it should be emphasised that there is no methodology to fit all

situations that might occur in the validation process. Depending on the specific

circumstances, the composition of a mixture of different techniques will be the most

appropriate way to tackle the validation exercise.

2.3.4 Benchmarking

In the context of validation, benchmarking can be defined as a comparison of internal

ratings and estimates with externally observable (whether public or non public)

information. For banking institution’s internal rating systems, an example of public

of 63

benchmarks includes the ratings given by ratings agencies such as S&P or Moody’s (see

Sobehart et al, 2000). In the case of the major rating agencies, detailed information about

the firms they rate is usually available, which makes testing and analysis of their rating

systems more feasible. Other examples of public but harder-to-analyse benchmarks include

Moody’s KMV EDFs, which are usually well disclosed but difficult to test as the

technology used is proprietary. Non-public benchmarks are typically supervisory

benchmarks that are usually not disclosed.

The most straightforward benchmarking is usually carried out for PD estimates because

they are obligor-specific and therefore it is relatively easy to define a set of borrowers

which are benchmarked. Generally, PDs are expressed on a universally understood zero-to-

one interval scale. It is more difficult to use benchmarking for EAD and LGD estimates

because they are exposure-specific, though such practice is growing. Benchmarking of the

ratings that often underlie estimates is even more difficult because of the need to map

different rating scales to a common scale.

As stated above our discussion, shall focus on PD validation, and thus regarding

benchmarking will regard PD and rating systems benchmarking. The notion of

benchmarking is refers to the mapping of internal ratings to an external rating system.

(a) Objectives of benchmarking

In a paper by the BCBS on Studies on the Validation of Internal Rating Systems, the

validation group consented that there is no single unambiguous and complete statistical test

developed that enabled validations of all the facets of an internal rating system. Difficulties

mainly relate to the effect of default correlation, data constraints, and the definition of

meaningful and robust target criteria for validating rating systems. In this respect,

benchmarking is often viewed as a complement to formal statistical backtesting of internal

rating systems. As a matter of fact, benchmarking does appear in many aspects to be part of

the whole process of producing internally generated estimates in banks’ internal systems.

For example, banks frequently use external and independent references to calibrate their

own rating systems in terms of PD. However, a bank’s internal rating should reflect its

internal risk management practices, and should not be a mere replication of an external

benchmark model.

of 63

In principle, it is possible to differentiate between two ways of carrying out benchmarking

for a certain set of borrowers or exposures:

(i) Comparison of the internal estimates of risk components (e.g. PD) across a panel. For

example, banks or supervisors may wish to compare PD estimates on corporates with

respect to a peer group. The main purpose is to assess the correlation of the estimates or

conversely the identification of potential “outliers” (this can be done using variance

analysis or robust regression) but not to determine if these estimates are accurate or not.

(ii) Comparison of internal estimates with an external and independent benchmark, for

example, a rating provided by a supervisory authority or rating agency. Here the

external benchmark is implicitly given a special credibility, and deviations from this

benchmark provide a reason to review the internal estimates. In this approach, the

benchmark is used to calibrate and/or validate internal estimates. Given difficulties with

identifying absolute benchmarks, one should be critical when using benchmarks.

In either case, benchmarking appears to be part of validation but may to some extent be

more flexible, as it allows banks and supervisors to decide what benchmark is most

appropriate and to enforce decision rules on how the IRB system should behave. In this

respect, benchmarking replaces a purely formal validation process (which is not always

available) with a more empirical and operational approach.

The first approach is of particular interest to supervisors and can be pursued in any

jurisdiction where banks adopt use internal rating systems for capital or as part of their

major processes. This can be done by obtaining PD estimates for a common set of

borrowers from different banks. However, although simple to implement, this approach

raises several difficulties.

(i) A major technical problem is often the identification of common borrowers across

banks. This can be alternatively viewed as the construction of a peer group. Depending

on the information sources available, tax codes, identification codes from public or

private credit registers or manual selection may solve this technical problem.

(ii) Once these common borrowers have been identified, comparing the different ratings

across banks would require their mapping to a master scale. This issue is similar to

of 63

mapping to a benchmark as explained below.

(iii) Once PDs for a peer group have been collected, benchmarking may support further

analysis: a widely used approach is to use benchmarking to identify outliers. In this

respect, benchmarking can also be viewed as part of non parametric tests to detect

potential and systematic bias in a bank’s methodology. As a matter of fact, identifying

outliers may prove difficult in effect, as differences in estimates may just stem from

differences in methodologies. For example, PD estimates may differ because of a

different definition of default. Thus, benchmarking might be regarded more as a

variance analysis which can still be useful as it provides a qualitative indicator of

potential differences in techniques used within the peer group. Such differences need

of course to be further analysed and their impact on the banking institutions and the

system as a whole must be understood.

Pursuing the second approach, involving an external benchmark, raises two major

concerns:

(i) The selection of the benchmark: selecting an appropriate benchmark may not be such an

obvious exercise as it requires some prior knowledge or some inference of the features

of the underlying model. Choosing a benchmark PD, for example, depends upon

whether the PD analysed is stressed or unstressed, and dynamic or static.

(ii) The mapping to the benchmark: the mapping refers to the one-to-one relation that can

be inferred between the unobserved model and its benchmark. Ideally, in the case of a

perfectly matching benchmark, this relationship will be perfectly one-to-one, but this

will not be true the general case. As such, formalising this relationship may be quite

difficult.

As a matter of fact, whether used with a relative or absolute objective, a comparison with

an external benchmark may in practice still appear to be rather subjective regarding the last

two aspects, i.e. what benchmark to use and how to use it, the difficulty reflecting the fact

that both issues are related. Most of the benchmarks used are, for example, PDs from rating

agencies or commercial vendor models, often regardless of their true adequacy. As a

consequence, mapping procedures may often be simple and misleading.

of 63

Yet, benefits of using benchmarking as a complement to validation could be greater if a

more objective approach to constructing decision rules for banks and supervisors was used.

Moreover, if benchmarks and mapping methodologies were available, validation could be

less costly and burdensome. Supervisory evaluation could then focus on assessing the

quality of the benchmark and the quality of the mapping. In addition, it could allow useful

inference on the underlying characteristics of the IRB system with respect to the

benchmark. The following sections look at the need to formalise the selection of

benchmarks and the mapping procedure.

Moreover, there is no clear distinction between the range of PD values that a grade spans

and the fact that the distribution of PD values of obligors assigned to the grade may not be

uniform. For example, even though a grade might be properly defined to span the PD

interval [.01,.02), and the average PD estimated for the grade might be .015, the true

average PD for the obligors assigned to the grade at any given time might be higher or

lower depending on the distribution of individual obligor PDs. Moreover, changes in the

distribution over time might change the true average PD.

(b) Selection of Benchmarks

In practice, credit risk modeling in banks follows a rather “bottom up” approach.

Notwithstanding reasons related to the empirical calibration of models, the segmentation of

credits by portfolios (bank, sovereign, corporate, etc.) is often justified by an operational

reality; banks develop different risk management and commercial techniques depending on

the business activity. This segmentation actually reflects the fact that credit risk is governed

by economic factors which are specific to the type of portfolio.

In this respect, the segmentation of portfolios according to specific economic

characteristics entails that the underlying default characteristics are also different. This

means for example, that the factors governing risks, say on banks, are not necessarily the

same, or do not necessarily follow the same dynamics as those governing, say corporates.

It appears then that in a bottom-up approach, which is most likely to be used by banks, a

portfolio’s segmentation would necessitate specific default models which are deemed to be

quite different. This observation has two major implications:

of 63

(i) With respect to the selection of benchmarks, considering ratings benchmarking as a

mapping to an external rating system entails that the selection of an appropriate

benchmark would rest upon the assessment of its qualities in adequately representing

the expected economic characteristics of the portfolio studied. For example, many

banks use rating agencies grades as benchmarks not only for their corporate portfolios

but also more extensively for their SME and SME retail portfolio. It is worth noting

that selecting a benchmark is dependent on the relationship of the portfolio with the

general economic environment. One should question whether benchmarking PD

estimates of SME portfolios on say, S&P ratings, is consistent (in terms of granularity,

calibration, discriminative power, etc.). For example, some banks would use the same

benchmark (e.g. S&P), to classify risks for their corporate, SME and SME retail

portfolios. One may therefore question whether the granularity of a rating system

benchmarked on S&P ratings (about 20 classes) is consistent for describing SME and

retail SME risks. It appears likely that this granularity would be too excessive for SME

risks, thus entailing the possibility of non-significant or non-discriminative risk

buckets. Conversely, the granularity is expected to be higher for retail SME, thus

entailing a likely too low discriminative power.

(ii) With respect to the aggregation on a master scale, an issue may also arise on the

consistency of the aggregation of specific underlying default models on a master

default model, on the one hand, and eventually the consistency of the master default

model obtained.

Overall, inconsistencies in the rating criteria, dynamic properties, and granularity of two

rating systems make benchmarking exercises involving such disparate systems

operationally and conceptually difficult, which reduces the value of such exercises. Thus,

consistency in benchmarks is desirable. Benchmarking generally requires a mapping

procedure, i.e. rules relating unambiguously one rating system to the other, to be defined.

Unfortunately, mapping procedures often appear to be rather simple or crude. Most of the

time, this mapping rests on empirical comparisons of average PDs as a basis for grouping

and matching risk buckets on a master scale. A distinction should be made between the

range of PD values that a grade spans and the fact that the distribution of PD values of

of 63

obligors assigned to the grade may not necessarily be uniform.

While simple to implement, such an approach may not seem satisfactory on theoretical

grounds and with respect to validation purposes for two reasons. First, the PDs compared

are not necessarily homogenous. They depend on the definition of default used, the rating

philosophy (TTC or PIT), and the conditioning (stressed, unstressed). These properties are

linked to the underlying default model and would need to be inferred in the first place as

suggested by the need to classify rating systems according to their dynamic properties.

Second, even in the case where the PDs compared are homogenous, this approach does not

take into account the granularities of each rating system (see above) which are proxies of

the true distribution of values of the underlying default model. The problem stems from the

fact that the distribution of obligors on a bucket is not observed, only an average PD.

Merging buckets on the ground of average PD implies an assumption that merging the

corresponding distribution does not alter the resulting average PD. This may be true, but is

not in the general case. Regarding PD, the problem of mapping could be considered as

optimising the granularity of the master scale in order to minimise the loss of information.

With respect to validation, special attention should therefore be given to the appropriate

granularity of the master scale and benchmark used. As mentioned before, this would need

some inference of the economic characteristics of the benchmark or master model.

For example, consider a bank’s portfolio specialised in corporate and SME lending.

Assume the risks on the SME portfolio are discriminated using a 10-grade rating scale,

while the risks on the corporate portfolio are discriminated using a 20-grade rating scale.

On both portfolios, the bank uses specific default models to derive its PD estimates. In

theory if the bank wishes to know its total risks on the whole portfolio, it would need to

somehow aggregate the two sub portfolios. One way to do this is to calculate separately

capital requirements on each sub portfolio and to aggregate them. This approach raises

consistency issues between models (see below). A more usual approach is to build a master

scale and its corresponding master default model. If the master scale has only 15 grades,

then information will be lost on the risk distribution of the underlying corporate risks. If 20

grades are used, then non significant or redundant risk buckets may be added for the

description of the underlying SME risks. Overall the discriminative power of the master

of 63

scale is likely to be affected. In the figure below we illustrate possible scenarios that can

result benchmarking a rating system. Suppose we plot the distribution of the internal and

external grades on the same axis from less risky to more risky. The figure below shows

possible trends that can arise from such an exercise.

Figure 3: Plot of Internal vs External Rating System

If the internal rating system tracks the benchmark consistently the dispersion of the data

points from a common trend line with negative gradient should be marginal. The greater

the dispersion the more the disagreement between the rating systems and from the

underlying assumption that the benchmark is accurate it implies that the internal rating

system is inaccurate.

If the plot shows two parallel trend lines as shown in Trend 2 above then either the internal

rating system is conservative or generous depending on the positioning of the internal

rating system trend line relative the benchmark trend line. If the bench mark is above the

of 63

internal rating scale then the internal rating scale is conservative and vice versa.

If the trend line of the bench mark and internal rating system intersect then it implies that

the internal rating system is conservative on a continuum of risk levels while generous on

others.

In this chapter we were able to review and discuss tools and techniques on rating systems

and their validation. In the next chapter we present the methodology that we are going to

use in facilitating the assessment and validation of rating systems.

of 63

Chapter3:Methodology

3.1 Introduction

We use obligor internal ratings data from two banking institutions to undertake validation

of their internal ratings and in turn credit rating system. The banks operate in Zimbabwe

and the banks’ rating systems and parameters were calibrated over time. We shall reference

the two banks as Bank 1 and Bank 2. Below we give a description of the ratings data

constructed from the internal rating systems of both banks before explaining the

methodology we will employ in validation.

3.2 Data Description

3.2.1 Bank 1

Bank 1 has an internal rating system that has five rating grades and an associated

Probability of Default (PD) master scale. Four grades are non-default grades while one is a

default grade. It is worth noting that the rating system’s number of rating grades is below

the recommended number of a minimum of seven non-defaulting grades and one default

grade. This is based on the provisions of paragraph 403 of the Basel II document. BCBS

justifies the provision by exclaiming that this ensures that there is no concentration of

obligors in some rating grades. Lack of granularity on a rating system thus causes

concentration of obligors in some grades and in turn may compromise the discriminatory

power of the rating system.

The rating system was calibrated using the bank’s credit data spanning five years. The

calibration of PDs using five years data is in line with Basel II provisions of paragraph 264

which requires a data history of at least five years.

Bank 1 classifies its obligors into the A, B, C, D, and E grades, where grade E is a default

grade.

The bank has created a dummy rating class that it terms E2. These are accounts that exhibit

structural weaknesses and need close attention, the bank prudentially classify this grade as

E for reporting purposes. E2 contains obligors that have not defaulted by definition, but are

perceived by the bank to default in the near future with certainty.

of 63

The scorecard used by the bank considers three factors, namely, financial factors, industry

specific factors and obligor specific factors such as corporate governance and management

structures.

The financial factors considered are:

(i) capacity to service facility if extended. This is inferred using the obligor’s

business financial ratios such as the Return on Capital (ROC), interest coverage

ratio, and the solvency ratio;

(ii) ability to service facilities when the business cycle is on the down turn. This is

inferred using the debt to equity ratio and the current ratio. Generally the factors

considered assess the entity’s correlation with the business cycle;

(iii) counterparty’s ability to raise additional funding. This is assessed by the quality

of management and ownership including if the entity has access to public funds

through the stock market; and

(iv) historical performance of the institution. To analyse this, the bank uses at least

three years data of the historical performance of the entity. The bank insists on

audited financials but in the absence of such it uses unaudited financials

conservatively.

The bank uses the same scorecard for its SME retail portfolio just as for corporations. The

use of the same scorecard for the SME retail portfolio and its corporate portfolio may

distort obligor migrations across grades as these portfolios have different idiosyncratic as

well as systemic factors that affect their performance.

(a) Portfolio Distribution

The bank’s distribution of ratings was as given in the table below:

of 63

Figure 4: Ratings Distribution by Class

(b) Ratings Classification

The grade E is defined as the default grade according to the Basel II definition as it also

contains obligors that are 90 days past due. The crude default rates in Table 3 below show

that there were 25 defaults in the portfolio, all from A and D classes.

Table 3: Crude Default Rates

As was discussed above the few rating grades have resulted in concentration of ratings in

the first grades. Rating class A contains 75.28% of total non-defaulted obligors in the

portfolio while rating class B contains 21.21% of total non-defaulted obligors. Such a

scenario is synonymous with rating system that lack granularity. The concentration of

obligors in the first class poses a danger of concentration of defaults in the first class as

reflected above.

Rating Class Total in Class Expected No. of Defaults/ year

Default Rates (Annualised)

Master Scale PD

A 6488 84 1.29% 0.009

B 1828 0 0.00% 0.0045

C 623 0 0.00% 0.00895

D 139 15 11% 0.01845

of 63

3.2.2 Bank 2

Bank 2’s internal rating system discriminates counterparties mainly with respect to the

counterparty’s financials. The rating system has five grades, four non-default grades and

one default class as show in table 4.

Table 4: Client Quality Classification

In forecasting obligor default, Bank 2’s rating system takes into consideration both obligor

specific characteristics and systemic factors that are likely to affect the credit worthiness of

the obligor.

The factors include:

(i) financial status, which is assessed using turnover and profitability, liquidity and

leverage ratio.

(ii) ability to service and repay facility from Cash flow projections, this is assessed

using cash flow projections provided by the counterparty.

(iii) general factors that are systematic in nature like industry and product type; and

(iv) management quality.

Grade Description Map to S&P Rating Scale

S&P PD Master Scale

A Strong AAA 0.01%

B Good AA-A 1.05%

C Acceptable BBB-B 5.00%

D Marginal B-CCC 22%

E Default D 100%

of 63

Portfolio Distribution

The bank’s distribution of ratings is as given in the table below:

Figure 5: Ratings Distribution by Class

3.3 Ratings and Rating Model Validation

As discussed in chapter 2, ratings validation is the backtesting of the final assigned ratings,

and assessment of associated long-run average one-year default probabilities (from the

master scale) against the average realised one-year default rate. Ratings validation focuses

on establishing:

(i) whether the rating system discriminates well between defaulters and no-defaulters

ex-ante.

(ii) whether long-run average default rates are consistent with the PDs assigned to the

rating grades.

(iii) are rating migrations consistent with the stated rating philosophy.

0%

5%

10%

15%

20%

25%

30%

35%

A B C D E

Rating Grade

of 63

The first focus of validation stated in (i) above relates to tests of discriminatory power of a

rating system. As discussed in chapter 2 this assesses if the use of the rating system is

adding value in forecasting default ex-ante. While (ii) relates to the test of calibration of the

rating system and (iii) refers to the stability of the rating system. As stated in chapter 2 use

of an unstable rating system for regulatory capital usually results in procyclical capital

levels.

3.3.1 Test for Discrimination

(a) Cumulative Accuracy Profile (CAP)

In section 2.2.3 of chapter 2 we gave a conceptual overview of the CAP approach and its

associated ratio AR. The CAP is essentially based on the notion that if the rating system

has discriminatory power then all defaults must occur from the lower classes. The

construction of the CAP was discussed in section 2.2.3 of chapter 2. We however, recall the

formular of AR below:

Where represents the area between the CAP curve of the rating model being validated

and the CAP curve of the random rating, represents the area between the CAP of the

perfect rating model and the CAP curve of the random rating system.

From Figure 1 in chapter 2 we note that the CAP curves are bounded within the unity area

square (between (0,0) and (1,1)). The CAP curve of the random rating model bisects the

square into half, thus the total area of the lower triangle is 0.5. Therefore we have the area:

12

Now, given that in the ideal case we have all defaults being assigned the lowest scores,

then the point at which the curve of the perfect model first hits 1 on the y-axis should be

equal to the probability of default of the validation sample, say p. Therefore, the area is

less than 0.5 by the area of the triangle given by points (0,0),(0,1) and(p,1). Thus:

of 63

12

12

121

Therefore:

12

12 1

2 1

1

A programming function can be written using the trapezium rule in Visual Basic (see

Appendix 1).

3.3.2 Test for Calibration

(a) Binomial Test

We use the Binomial test to investigate whether PDs are in line with observed default rates.

This is done under the hypothesis: the PDs are not underestimated.

The test is done using master scale PDs and these are tested against observed default rates

over a given period.

Given the nature of the rating system the test is conducted under the following

assumptions:

(i) Defaults are not correlated.

(ii) Default occurrences are mutually exclusive events.

(iii) Defaults are independent of time, that is, default probabilities are constant across

time.

If D represents the number of defaults that occurred in a given period and N the total

number of counterparties in the class at the beginning of the year.

Further, PD is the PD forecasted by the PD Master Scale as the likelihood of default over

the period.

For a class with less than 30 counterparties the hypothesis that PD is not underestimated is

of 63

rejected if:

PDNDBin ,,11

where is the significance level which can be 1% ,5% or 10%; and Bin(r,n,p) represents

the binomial probability of observing r successes out of n trials with probability of success

p.

If a class has 30 or more counterparties a Normal approximation to the Binomial

distribution is used and the above hypothesis is rejected if:

NPDPD

NPDD

1

5.01

Where Φ is the cumulative standard normal distribution.

(b) One Factor Test

If the assumption of independence in the Binomial test is relaxed, we can use a one factor

test that also incorporates systemic factors as possible causes of default. Here we assume

that the deviation of the observed PD from the long run average may be a result of systemic

factors such as a bad macroeconomic environment that affected all borrowers.

Let y represent the percentage change in the value of assets of any borrower over a one year

period. We assume that this change is driven by both idiosyncratic factors and a single

systemic factor. Let ω and Z be idiosyncratic and systemic parameters respectively, both of

which are standard normal variables. Therefore we have the relation below.

1

Where: ρ is the correlation of the borrower with the systemic factor. If we let µ be the percentage change below which the borrower would default, then we have:

of 63

μ 1 μ

μ

1

By construction Z is also normally distributed and since the PD for the obligor is known from the PD Master Scale this gives:

μ Φ PD

Where Ф() represents the cumulative normal distribution. Therefore:

μμ

1

ΦΦ PD

1

P(y < µ) can be interpreted as the probability of default that is observed for the class thus:

⁄

Where D are observed defaults over the year from the rating class and K is the total number of defaults. Therefore:

⁄ ΦΦ PD

1

Thus,

Φ Φ ⁄ 1

As stated above, Z follows a standard normal distribution, therefore the probability of

realising Z is given by Ф(Z). At a significance level α we reject the master scale PD if:

ΦΦ Φ ⁄ 1

(c) Assessing Rating Philosophy

To assess the rating philosophy of the rating system, i.e. whether the ratings are Point-in-

of 63

Time (PIT), Through-the-Cycle (TTC) or a hybrid, we calculate the mobility indices from

the migration, matrices. The commonly used mobility indices in industry are MP and MSVD,

defined mathematically as:

IPPwithN

PPPM

PtraceNN

PNN

PM

N

ii

SVD

N

iiP

~)

~'

~(

)(

))((1

1)(

1

1)(

1

1

In the equations above, P denotes the migration matrix and pij is the percentage of

companies in rating class i that are in class j one period later. Accordingly, the entry pii on

the diagonal is the percentage of companies that stays in rating class i. I is the identity

matrix, i.e. a stable migration matrix without changes of rating classes. The eigen values of

P are denoted by λi(P). The higher the index (say above 0.5) the less stable the transition

matrix is and the lower the index (say below 0.25) the more stable the transition matrix.

of 63

Chapter4:Results

4.1 Introduction

In this chapter we present the major results of our study based on the ratings produced by

the rating systems described and methodologies discussed in chapter 3.

4.2 Analysis of Bank 1 Ratings Data

4.2.1 Ratings Migration

The following is a migration matrix from the ratings data over a quarter:

Table 5: Migration Matrix

The transition matrix shows that the default class, E, does not represent a credit quality

state that is absorbing. Thus, the default class does not represent total inability of a

counterparty to meet an obligation but represents delinquent behaviours of counterparties.

This is supported by the 0.92 probability of counterparties rated D migrating to the A class

which represents the highest credit quality.

Generally, all classes that are close to the default class E, have a non-zero probability of

migrating into rating class A. This shows that the bank may be placing greater emphasis on

loan classifications when determining borrower rating. This presents a challenge since

rating migrations will become unstable and the default class will not be absorbing. This

may also reflect the bank’s own interpretation of the definition of default. Though the

bank’s definition is equivalent to the Basel II definition of default the bank’s interpretation

and application may not be consistent with the principles outlined in chapter 2.

The mobility index MP, for the matrix above, was found to be 1.21 implying a PIT rating

A B C D E

A 0.309 0.518 0.142 0.027 0.003

B 0.954 0.013 0.033 0 0

C 0.981 0.008 0.002 0.01 0

D 0.921 0.007 0 0.043 0.029

Initial Rating

Final Rating

of 63

system.PIT rating philosophy may be appropriate for credit granting decision making, if

used in conjunction with continuous monitoring between ratings updates, but may not be

very useful for purposes of setting either regulatory capital (RC) or economic capital (EC).

A PIT rating system can at times understate capital as the business cycle varies, leading to a

bank holding less capital than the risk inherent in its portfolio. Further, a purely PIT rating

system for credit portfolio management is costly to maintain. When using such a system,

obligors must be re-rated frequently otherwise the obligors’ PDs will not reflect the current

expectation of default likelihood.

4.2.2 Tests for Discrimination

Figure 6: CAP for Ratings Data from Bank 1

The CAP plot shown above highlights that the CAP curve for the internal rating scale is

convex and not concave, section 2.3 highlighted that such curvature of a CAP curve means

that the internal rating scale is sub-optimal in applying available information in rating. This

is also reinforced by the CAP curve’s AR of -0.05. This may be also a result of a situation

where the bank uses facility ratings or loan classification, which measure delinquent

of 63

behaviour, in coming with obligor ratings.

4.3 Tests for Calibration

(a) Binomial Test

A Binomial test of the master scale PDs on the portfolio gives the results in the following

table:

Table 6: Results of the Binomial Test

Where PDU means PD is underestimated and PDNU means PD is not underestimated. In

the above table we reject the null hypothesis at the 1% level if the p-value is less than 1%.

The red coloured text represents a p-value that is less than 1% and the black coloured text

represents a p-value that is greater than 10%.

The table shows that at a 1% significance level the Master Scale PDs underestimates the

observed default rates for rating classes A and D for the portfolio.

However, observed PDs of the portfolio for the quarter are not underestimated by the PDs

of the master scale for the B and C classes at the 1% significance level.

It should be noted however, that the lack of granularity of the rating scale resulted in

concentration in the first grades and in turn elevated the likelihood of a deviation of

observed PDs from their long run average. Further the inclusion of SME retail obligors in

the portfolio who are highly correlated to the business cycle may have also contributed to

the large number of defaults in class A.

The Binomial test above, as a result of the independence of defaults assumption, only

considered idiosyncratic factors. That is factors that are obligor specific with regards

default. We therefore consider a one factor test below.

Rating Class Total in ClassExpected No. of Defaults/ year


PD Master Scale p-Value (at 1% level)

Conclusion

A 6488 84 1.3% 0.9% 0.000482 PDU

B 1828 0 0.0% 0.5% 0.998853 PDNU

C 623 0 0.0% 0.9% 0.995126 PDNU

D 139 15 11.0% 1.8% 0.000000 PDU

of 63

(b) One Factor Test

If the assumption of independence in the Binomial test is relaxed, we can use a one factor

test that also incorporates systemic factors as possible causes of default. It allows us to test

if it was a result of the economy which was in a downturn.

If the one factor test is done on the classes that showed that the master scale underestimated

actual default rate assuming a correlation of 7% to the business cycle, the results are as

shown in the table below.

Table 7: Results of the One Factor Test

The table above shows the test for class A and D that failed the Binomial test. The table

shows that the underestimation of the default rate by the Master Scale PD of the A class

was due to systemic factors like the business cycle which was on the down turn. However,

the one factor test shows that at the 1% significance level the default rate for the D class

was underestimated.

The underestimation of defaults leads to underestimation of provisions and if the bank is

using its internally derived PDs for Regulatory Capital the bank will underestimate its

Regulatory Capital. This will result in the bank holding credit risk capital which is not

commensurate with its credit risk profile.

4.3.1 Analysis of Bank 2 Ratings Data

This test is based on the assumption that, for a perfect rating scale, all defaults are expected

to be from the least credit class.

The CAP is constructed as shown in the diagram below. An accuracy ratio is then

computed as a ratio of the maximum possible area enclosed below the curve of a perfect

rating system and the diagonal line, and the area under the curve of the rating system CAP

under assessment.

Rating Class Total in ClassExpected No. of Defaults/ year


PD Master Scale p-Value (at 1% level)

Conclusion

A 6488 84 1.3% 0.9% 0.205833 PDNU

D 139 15 11.0% 1.8% 0.000367 PDU

of 63

With time, the credit quality of borrowers changes as various macroeconomic variables

change. Thus it is critical to monitor these migrations to ensure credit decisions are

informed by relevant credit metrics. The likelihood of such migrations is measured by

transition matrices as discussed in chapter 2.

The following is a migration matrix from Bank 2’s obligor ratings.

Table 8: Migration Matrix for Bank 2 Credit Portfolio

The transition matrix shows that ratings assigned to counterparties are stable over time as

shown by the high probability of counterparties remaining in originally assigned classes.

Such a rating system is usually characteristic of a TTC rating system. However, the lack of

migration from an original class to other classes needs to be interrogated. This shows that

the bank may be only rating borrowers on loan granting with no further rating revisions

during the life of the loan. Such a scenario produces direct jumps in ratings from originally

assigned ratings into the default class. If the bank’s rating system is based on a PIT rating

philosophy the strategy may result in biased ratings with time.

The mobility index MP, of the above transition, was found to be 0.18 implying a TTC rating

system. TTC rating philosophy is most appropriate for purposes of setting either regulatory

capital (RC) or economic capital (EC) as it provides ratings that are stable through the

business cycle.

A B C D E

A 1 0 0 0 0

B 0 0.92 0 0 0.08

C 0 0 0.9 0 0.1

D 0 0 0 0.65 0.35

Rating at Beginning of

Period

Rating at end of period

of 63

(a) Cumulative Accuracy Profile

Figure 7: CAP for Ratings Data from Bank 2

The maximum area is given by the area under the curve of the possible perfect rating

system associated with the rating system under analysis.

The accuracy ratio ranges between -1 and 1. For a rating system that adds value it must

have an accuracy ratio that is greater than zero, otherwise the rating system is randomly

assigning ratings to counterparties and it has extremely weak or no discriminatory power.

An accuracy ratio of 1 represents a rating system that has perfect discriminatory power,

while a ratio of 0 represents a rating system which randomly assigns borrowers to rating

grades.

The CAP of Bank 2’s rating system is shown in the diagram above and was constructed

using ratings generated from the bank’s rating system. The rating system has an accuracy

ratio of 0.54 which shows that the rating system adds value regarding discriminating

counterparties with respect to credit quality.

4.3.2 Test for Calibration

(a) Binomial Test

of 63

In testing the calibration power of the rating system we also employ the binomial test to the

portfolio data of Bank 2. The results are as shown in the table below.

Table 9: Binomial Test for Bank 2’s Rating System

In the above table the rejection criterion for the null hypothesis is similar to the one above

for Bank 1. The table shows that at a 1% significance level we fail to reject the null

hypothesis that the Master Scale PDs do not underestimate the observed default rates for

the rating system.

In this section we presented results of the validation of Banks 1 and 2’s rating systems. The

tests highlighted that Bank 1’s rating system has poor discrimination and requires re-

calibration of some of its parameters. However, Bank 2’s rating system has been

performing according to intended objective as highlighted by the tests.

Rating ClassAnnual Default Rates

PD Master Scalep-Value (at 1% level)

Conclusion

A 0.00% 0.01% 0.998702 PDNU

B 8.00% 1.05% 0.887813 PDNU

C 10.00% 5.00% 0.811323 PDNU

D 35.00% 22.00% 0.923510 PDNU

of 63

Chapter5:Conclusion

In this chapter we give the main results of our research and offer our recommendations.

The research noted that validation approaches that are currently in place enable users to

objectively determine the sufficiency of rating systems in meeting intended objectives. The

binomial test and the one-factor test aid in informing the significance of rating parameters

of a rating system. Interviews with bank management of bank 1 highlighted that their rating

system had not been validated since inception and it has been in use for more than 5 years.

The results in chapter 4 noted that the bank’s rating system sub-optimally uses available

information on borrowers and hence does not discriminate appropriately borrowers with

respect to credit risk. We have thus been able to show that proper validation techniques are

necessary for ensuring that rating systems operate as intended.

The consequences of using such a rating system for capital is dangerous, as it facilitates

serious understatement of provisions and capital that the bank is supposed to hold. Thus

resulting in inadequate buffers against expected and unexpected credit risk loses.

Bank 2’s rating system was operating as intended even though it had not been validated

since inception. It is, however, critical for the institution to start validating the rating

system to monitor its continued relevance to changes in the operating environment and

portfolio mix of borrowers. The study noted that it is critical for banking institutions to

incorporate and most importantly make operational a sound definition of default. This

enables banking institutions to consistently and objectively identify defaults when they

occur.

As has been discussed that the Zimbabwe market like the rest of the MEFMI region, is

characterised by few corporate entities that are rated by reputable external rating agencies,

lack of expertise on rating systems development and maintenance, absence of marketable

credit securities (e.g. credit default swaps) and short credit data histories. In such an

environment it may be necessary for regulators and supervisors to facilitate the creation of

a central credit risk data registry. Such a registry enables pooling of data for validation and

calibration of various rating parameters. Supervisors need to also develop skills in rating

of 63

validations and in turn guide bankers on sound validation standards.

It is therefore recommended that Supervisors in the MEFMI region begin to nature

competences in sound validation practices to ensure that they are relevant to changes in

their financial systems on an ongoing basis. In order to overcome the challenges regarding

sufficiency of data it is necessary that central banks facilitate the establishment of central

credit registers which enable banks to determine the level of borrower leverage and pooling

data. This will also assist in providing proper data length for calibration and validation of

various credit risk rating parameters. It is also critical that central banks in the MEFMI

region provide guidance to banks operating in their jurisdictions on sound principles of

operating and maintaining rating systems.

of 63

Appendix 1: Visual Basic Program for CAP

Function CAP(ratings, defaults)

'function written for data that is sorted from worst trating to best

Dim N As Long, numdef As Long, a As Integer, i As Long

Dim xi As Double, yi As Double, xy(), area As Double

N = Application.WorksheetFunction.Count(defaults)

numdef = Application.WorksheetFunction.Sum(defaults)

'Determine number of rating categories K

K = 1

For i = 2 To N

If ratings(i) <> ratings(i - 1) Then K = K + 1

Next i

ReDim xy(1 To K + 2, 1 To 2)

'First row of function reserved for accuracy ratio, 2nd is origin (0,0)

'so start with a=3

a = 3

For i = 1 To N

'cumulative fraction of observations (xi) and defaults (yi)

xi = xi + 1 / N

yi = yi + defaults(i) / numdef

'Determine CAP points and area below CAP

If ratings(i) <> ratings(i + IIf(i = N, 0, 1)) Or i = N Then

xy(a, 1) = xi

xy(a, 2) = yi

area = area + (xy(a, 1) - xy(a - 1, 1)) * (xy(a - 1, 2) + xy(a, 2)) / 2

a = a + 1

End If

Next i

'Accuracy ratio

xy(1, 1) = (area - 0.5) / ((1 - numdef / N / 2) - 0.5)

xy(1, 2) = "(Accrat)"

CAP = xy

End function

of 63

Bibliography

1. Basel Committee on Banking Supervision (2006). “International Convergence of

Capital Measurement and Capital Standards: A Revised Framework”, Comprehensive

Version, Bank for International Settlements, Basel, Switzerland.

2. Basel Committee on Banking Supervision (1988). “International Convergence of

Capital Measurement and Capital Standards” Bank for International Settlements,

Basel, Switzerland.

3. Basel Committee on Banking Supervision (2000). “Principles of Managing Credit

Risk”

4. Basel Committee on Banking Supervision (2006).”Core Principle for Effective Banking

Supervision” Bank for International Settlements, Basel, Switzerland.

5. Basel Committee on Banking Supervision (2005).” Basel Committee News Letter No.

4” Bank for International Settlements, Basel, Switzerland.

6. Basel Committee on Banking Supervision (2005).”Studies on the Validation of Internal

Rating Systems”, Working Paper No. 14, Bank for International Settlements, Basel,

Switzerland.

7. Blochwitz. S, Hohl. S (2006) “The Basel II Risk Parameters” pages 243-262, Springer,

New York, USA.

8. Board of Governors of the Federal Reserve System (2009). “The Supervisory Capital

Assessment Program: Overview of Results”.

9. Board of Governors of the Federal Reserve System (2009). “The Supervisory Capital

Assessment Program: Design and Implementation”.

10. Cloquitt. J (2007): “Credit Risk Management”, Third Edition, McGraw Hill, New York,

USA.

11. Dowd. K. (2006): “Measuring Market Risk”, Second Edition, John Wiley & Sons Ltd,

West Success, England.

12. Engelmann. B, Hayden. E and Tasche. D (2002): “Measuring Discriminative Power of

Rating Systems”

13. Engelmann. B, Hayden. E and Tasche. D (2003): “Testing Rating Accuracy”, Risk.Net

14. Engelmann. B and Rauhmeier. R(2006) “The Basel II Risk Parameters”, Springer, New

York, USA.

of 63

15. Jafry. Y and Schuermann. T (2003): “Metrics for Comparing Credit Migration Matrix”,

Wharton Financial Institutions Center

16. Loffler. G and Posch. P. N (2007), “Credit Risk Modeling Using Excel and VBA” John

Wiley & Sons Ltd, West Success, England.

17. Ncube. A, Kavuma. S (2009): “Current Perspectives, Challenges and Consensus on

Developing Domestic Financial Markets in MEFMI Countries”, Issue No. 7, MEFMI

Forum.

18. Rauhmeier. R and Scheule. H (2005): “Rating Properties and Their Implication on

Basel II capital”

19. Reserve Bank of Zimbabwe: 2011 “Guideline: :No.1-2011/BSD - Technical Guidance

on the Implementation of Basel II in Zimbabwe”

20. Resti. A and Sironi. A (2007): "Risk Management and Shareholders Value in Banking",

John Wiley & Sons Ltd, West Success, England.

21. Sobehart. J. R, Keenan. S. C and Stein. R. M (2000): “Benchmarking Quantitative

Default Risk Models: A Validation Methodology”, Moody’s Investor Service.

22. Tasche. D (2009): “Estimating Discriminatory Power and PD Curves When the

Number of Defaults is small”.

an assessment of credit risk rating models in …mefmi.org › mefmifellows › wp-content ›...

Documents