an assessment of credit risk rating models in …mefmi.org › mefmifellows › wp-content ›...
TRANSCRIPT
Macroeconomic and Financial Management Institute of Eastern and Southern Africa
AN ASSESSMENT OF CREDIT RISK RATING MODELS IN
RELATION TO BASEL II PROVISIONS: A CASE FOR ZIMBABWE
Bob Takavingofa
Reserve Bank of Zimbabwe
January, 2012
A Technical Paper submitted in Partial Fulfilment of the Award of
MEFMI Fellowship
Page ii of 63
Abstract
Credit risk rating systems now play a central role in banking institutions across the globe to
the extent that their performance has a direct impact on profitability and soundness of the
institutions. This has prompted vast research into the field.
In this paper we assess the performance of credit rating systems in Zimbabwe and derive
generalizations for the MEFMI region. We use an in-depth study of two banking
institutions’ internal credit risk rating systems. Various credit risk validation tests that are
currently applied in industry are explored and applied. The study concludes that proper
credit risk rating system methodology and governance are essential for the continued
relevance of the credit assessment process.
Page iii of 63
TableofContents Abstract .................................................................................................................................. ii
List of Figures ....................................................................................................................... iv
List of Tables ......................................................................................................................... v
Acknowledgements ............................................................................................................... vi
Chapter 1: Introduction .......................................................................................................... 1
Chapter 2: Literature Review ................................................................................................. 6
Chapter 3: Methodology ...................................................................................................... 35
Chapter 4: Results ................................................................................................................ 45
Chapter 5: Conclusion.......................................................................................................... 52
Bibliography ........................................................................................................................ 55
Page iv of 63
ListofFigures Figure 1: Cumulative Accuracy Profiles .............................................................................. 19
Figure 2: Distribution of Defaulters and Non-defaulters ..................................................... 21
Figure 3: Plot of Internal vs External Rating System .......................................................... 33
Figure 4: Ratings Distribution by Class ............................................................................... 37
Figure 5: Ratings Distribution by Class ............................................................................... 39
Figure 6: CAP for Ratings Data from Bank 1...................................................................... 46
Figure 7: CAP for Ratings Data from Bank 2...................................................................... 50
Page v of 63
ListofTables Table 1: Transition Matrix ..................................................................................................... 9
Table 2: Rating System Validation Dimensions .................................................................. 10
Table 3: Crude Default Rates ............................................................................................... 37
Table 4: Client Quality Classification .................................................................................. 38
Table 5: Migration Matrix for Bank 2 Credit Portfolio ....................................................... 49
Table 6: Migration Matrix ................................................................................................... 45
Table 7: Results of the Binomial Test .................................................................................. 47
Table 8: Results of the One Factor Test ............................................................................... 48
Table 9: Binomial Test for Bank 2’s Rating System ........................................................... 51
Page vi of 63
Acknowledgements
I would like to extent my sincere thanks to my mentor, Dr. Mudavanhu for all the guidance in coming up with the technical paper.
I also thank Messrs Mataruka, Chirozva and Chiviri for all the encouragement through the
fellowship program.
The MEFMI secretariat for the funding of my customised training program, I thank you so
much. I appreciate Messrs Ngalande, Ncube, Kavuma and Namagoa, Mrs. Makamba and
the entire MEFMI family for believing in me and offering me the distinguished opportunity
to be part of the fellowship program.
I would also like to acknowledge and thank Professor Petersen for the guidance at the start
of this work.
To my wife, Samukile and daughter, Thandeka, I appreciate your endurance and support. I
love you guys.
Above all I thank God, the father of our Lord Jesus Christ who has done all this, Amen.
Page vii of 63
List of Acronyms and Abbreviations
AIGV Accord Implementation Group Validation
AIG Accord Implementation Group
AR Accuracy Ratio
AUC Area Under the Curve
BCBS Basel Committee on Banking Supervision
BIS Bank for International Settlements
CAP Cumulative Accuracy Profile
EAD Exposure at Default
EDF Expected Default Frequency
IRB Internal Rating Based
LGD Loss Given Default
MEFMI Macroeconomic and Financial Management Institute
PD Probability of Default
PDNU Probability of Default not Under Estimated
PDU Probability of Default Under Estimated
ROA Return on Assets
ROC Receiver Operating Characteristic
ROE return on Equity
TTC Through-the-Cycle
PIT Point-in-Time
Page 1 of 63
Chapter1:Introduction
1.1 Background
One of the core functions of banking institutions is that of transforming short term
liabilities into long term assets for a premium that is commensurate to the risk inherent in
the created assets. This critical role of maturity transformation enables banks to allocate
capital resources from surplus units of the economy to deficit units, thus facilitating
economic activity. The business model of banking institutions, however, exposes them to
various risks that include credit risk. Credit risk is the risk of loss arising from a borrower
or counterparty not performing according to agreed terms.
The major challenge for banking institutions in their intermediary role is the determination
of credit worth borrowers whose economic activities will survive beyond the tenure of
outstanding loans. The insolvency of a banking institution’s borrowers can cause
devastating consequences as was witnessed during the global financial crisis that started in
the United States (2007-2009). Further, credit risk can induce the crystallisation of other
risks such as liquidity, reputational and earnings risk. Thus banks aim for high quality
borrowers when extending credit. Extending credit to credit unworthy borrowers may,
however, be used as a strategy for higher returns for as long as there are proper mitigation
strategies in place and the bank is aware of the level of credit risk inherent in the loan.
Banking institutions have thus developed tools to help them discriminate between high
quality borrowers and low quality borrowers. The tools now range from credit risk
scorecards to more complex rating systems. The credit risk models employed are now
being used for loan granting and risk pricing of the loans to ensure that banks are
compensated for the credit risk they assume. The tools are generally referred to as credit
risk rating systems. Colquitt (2007) asserts that internal credit risk rating systems have
become the cornerstone to managing a range of credit functions and now also serve as a
framework for management credit decisions in banking institutions.
The need for credit rating systems is underscored by the fact that usually more than 70% of
total balance sheet assets of a banking institution are loans, which have an implication on
Page 2 of 63
exposure to credit risk. This was recognised in 1988 by the Basel Committee on Banking
Supervision (BCBS) when they released the first capital accord titled “International
Convergence of Capital Measurement and Capital Standards”, popularly known as Basel I,
which required internationally active commercial banking institutions to hold capital
commensurate to their credit risk profile. With the fast-paced evolution of financial markets
and complexity of banking risks that encouraged advances in banking institution’s risk
management systems, BCBS further revamped its capital computation standards in 2004.
The BCBS published the “Revised International Convergence of Capital Measurement and
Capital Standards Framework” (Basel II) which has increased reliance on banking
institution’s internal estimates of credit worthiness (see BCBS, 2004). This was an
appreciation by BCBS that banking institutions had developed internal credit risk rating
models that are efficient in forecasting borrower default in loan granting and monitoring.
The overarching objective of the revision was to try and mirror the complexity that banks
use when calculating their economic capital for credit risk in the computation of regulatory
capital. The Revised Capital Accord gives provisions under the advanced approaches for
banks, after meeting minimum qualitative and quantitative requirements prescribed in the
accord, to use their internal models to compute capital. For credit risk banks are allowed to
use the Internal Rating Based (IRB) approach, which has the Foundation Approach and the
Advanced Approach. Under the Foundation Internal Rating Based approach banks provide
their own estimate of the Probability of Default (PD) in the Supervisory formula used to
calculate regulatory capital for credit risk. It is thus critical to note that the accuracy of
generated ratings is of paramount importance for banking institutions operating under IRB
approaches as they have direct implication on the sufficiency of required regulatory capital
to cover unexpected losses. Informed by observations made during the global financial
crisis the BCBS re-emphasised the need for robust credit risk capital parameters, especially
PDs, in their Basel III publications.
Internal rating models used by banking institutions for credit granting and pricing,
however, expose banks to model risk. Model risk is the risk of erroneous results or
interpretations due to errors in the model or in the overall modelling process. Model risk is
cited as one of the chief causes of the subprime crisis in the United States as some experts
argue that financial institutions were using erroneous models due to the wrong assumptions
Page 3 of 63
they were developed from. Experts further assert that the crisis was also compounded by
the lack of oversight over models risk by Supervisors as they did not understand some of
the models that banks were using. The threat of model risk materialising is mitigated by
sound validation techniques. The BCBS, in the context of rating systems, define the term
“validation” as encompassing a range of processes and activities that contribute to an
assessment of whether ratings adequately differentiate risk, and whether estimates of risk
components (such as PD, Loss Given Default (LGD), or Exposure at Default (EAD))
appropriately characterise the relevant aspects of risk. Validation of credit risk rating
systems is vital as it iteratively assesses and ensures that the systems perform according to
intended objectives. Credit risk validation provides a structured approach to assessing both
the discriminatory and calibration power of rating systems. Failure to run value adding
validation programs of rating systems may lead to under or over estimation of credit risks
inherent in loans. Under estimation leads to under pricing of loans which prejudices the
bank as it would not be appropriately compensated for the risks undertaken and may result
in the institution’s credit portfolio mainly composed of low quality borrowers. Over
estimation leads to over pricing of loans which may result in an exodus of high quality
borrowers to banks where they get lower pricing in relation to their risk profile.
Against this background, this paper provides the key issues involved in credit risk
validation of credit risk rating systems. Emphasis will be placed on understanding the
validation techniques.
1.2 Problem Statement
Jurisdictions in the MEFMI region are at various stages of implementing Basel II and III, in
order to accurately quantify their financial risks and determine the appropriate regulatory
capital requirements. Accordingly, banks in MEFMI member states have begun to devote
resources towards the development of internal credit risk rating systems in line with current
international best practice. These efforts have been recognized and encouraged by bank
regulators in the region as noted by their phased implementation of Basel II and III.
Zimbabwe has even gone further to modify the BCBS prescribed standardised credit risk
approach, and the Reserve Bank of Zimbabwe’s Guideline No.1-2011/BSD “Technical
Page 4 of 63
Guidance on the Implementation of Basel II in Zimbabwe” makes it mandatory for every
banking institution operating in Zimbabwe to have an operational internal rating system.
The modification was done in view of the fact that implementing the Basel II credit risk
standardised approach in their raw form was effectively equivalent to maintaining Basel I
for the cooperate portfolio. This is so because there are no corporations that are rated by
credible external rating agencies except of banking institutions.
Some banking institutions in Zimbabwe, however, do not conduct regular validations for
their internal credit risk rating systems and decisions with some of the bank managers
revealed that some of the rating systems have not been validated since inception owing to
lack of skills and appreciation of the value brought by validation exercises. Therefore the
major challenge for both banks and regulators lies in the evaluation of the accuracy of the
model’s forecasts of credit losses.
The implementation of a new standard such as Basel II and III in any jurisdiction is always
a learning process for both supervisors and banks. It is therefore critical for supervisors to
be ahead of the market through developing skills in assessing and maintenance of rating
systems including their shortfalls so as to have oversight of model risk inherent in their
financial systems.
1.3 Objectives
From the foregoing, the following objectives are set:
(i) To explore sound credit risk rating systems methodologies and processes that are used
by banks globally.
(ii) To explore sound credit risk rating systems validation techniques as mitigants against
model risk.
(iii)To apply sound credit risk rating systems validation techniques in assessing the
sufficiency of credit risk rating models, in forecasting credit risk defaults, used by two
banks operating in Zimbabwe.
Page 5 of 63
1.4 Significance of Study
Banking institutions are now using credit risk rating systems for credit granting and pricing
of loans. Further, the BCBS core principles of effective banking supervision principles 19-
21 require Supervisors; to have a Supervisory review process, which reviews all functional
areas of a banking institution. In this regard, it is imperative for supervisors to develop an
understanding and skills in the validation of rating systems. The study will focus on credit
risk rating systems validations.
The purpose of the study is to give awareness to both regulators and bankers on the
importance and value added by implementing sound iterative validation programs for rating
systems. This will be done using an in-depth case study of two Zimbabwean banking
institutions whose rating systems have been operational for many years
1.5 Limitations of the Study
The major limitation of the research is sufficiency and soundness of data including a lack
of appropriate benchmark rating systems. This will limit the range of validation techniques
that can be undertaken. Further, the MEFMI region is a collection of emerging markets that
are characterised by the absence of a secondary market for credit related securities such as
credit default swaps. Such instruments enable benchmarking of internal ratings to market
implied default rates. The spectrum of borrowers in some sectors of the economy in
Zimbabwe as well as other MEFMI countries is limited, thus limiting the number of
observations for some rating categories which may result in biased computed estimates
such as PDs. Zimbabwe like other MEFMI countries does not have credible external rating
agencies to benchmark with their internal obligor ratings. In addition some countries in the
MEFMI region do not have reliable audit statements for all operating firms. This introduces
some challenges for credit analyst and may result in an inconsistent rating methodology.
Page 6 of 63
Chapter2:LiteratureReview
2.1 Introduction
Credit risk rating systems have gained prominence over the years as efficient tools that aide
banking institutions in discriminating borrowers according to their forecasted credit quality.
Adverse selection in credit extension, as was the case just before the global financial crisis,
does not only expose a bank to credit risk only but also to liquidity risk as default causes
balance sheet liquidity timing dislocations.
In this section we review the literature on credit rating systems and validation
methodologies. We discuss common credit risk assessment models used in practice to gain
understanding of their strength and weaknesses in line with our set objectives outlined in
chapter 1. We then discuss validation techniques and benchmarking of internal rating
systems before concluding.
2.2 Credit Risk Assessment Techniques
The main objective of credit assessment personnel and credit risk managers is the
prediction of default probability of obligors based on available information on the obligor
at the start and throughout the life of a credit relationship. Generally, borrowers have more
information about their credit quality at any given point of a credit relationship than their
lenders. It is in this regard that banks, in view of this information asymmetry, limit
borrowers’ access to the bank’s credit, rather than allowing borrowers to select the sizes of
their loans without restriction. The limit on the credit granted to borrowers must however
be based on the level of default likelihood of the borrower so as to avoid adverse selection.
Thus banking institution employ various approaches in designing credit rating systems that
discriminate between credit worth borrowers and defaulters. Some of the approaches
include the following techniques:
(a) Econometric Techniques
Econometric techniques such as linear and multiple discriminant analysis, multiple
regression, logit and probit analysis all can be used to model the probability of default of
Page 7 of 63
obligors. The independent variables include financial ratios and other indicators as well as
external variables used to measure economic conditions. Survival analysis refers to a set of
techniques used to measure the time to response, failure, death, or the development of an
event. These models have the weakness of being data hungry in calibrating the model
parameters and results are usually dependant on the data set used.
(b) Neural Networks
These are computer-based systems that try to mimic the functioning of the human brain by
emulating a network of interconnected neurons (the smallest decision making units in the
brain). They use the same data employed in the econometric techniques but arrive at the
decision using alternative implementations of a trial and error method. However, the
disadvantages of neural network systems include:
(i) The time and effort required to translate the human experts’ decision processes
into a system of rules may be enormous.
(ii) The difficulty and costs associated with programming the decision algorithm and
maintaining the system.
(c) Optimization models
These mathematical programming techniques discover the optimum weights for obligor
and loan attributes that minimize lender selection error and maximize profits. Such systems
have disadvantages which include difficulty in obtaining closed form solutions and at times
are difficult to implement.
(d) Rule-based or expert systems
These mimic in a structured way the process used by an experienced analyst to arrive at the
credit decision. As the name indicates, such a system tries to clone the process used by a
successful analyst so that this expertise is available to the rest of the organization. Rule-
based systems are characterized by a set of decision rules, a knowledge-base consisting of
data such as industry financial ratios, and a structured inquiry process to be used by the
analyst in obtaining data on a particular borrower.
Page 8 of 63
Although many banking institutions still use expert systems as part of their credit decision
process, these systems have two main shortfalls:
(i) Consistency. In assessing inherent credit risk there are various evaluation parameters
and factors that can be used. This then creates a problem of consistency and thus two
expert systems that are based on different assessment parameters are incomparable.
(ii) Subjectivity. The weights applied to the factors and parameters are subjective as they
are based on the expert assessing the borrower and can vary from borrower to
borrower making comparability of rankings very difficult.
(e) Hybrid Systems
These systems use direct computation, estimation and simulation. These are partly driven
by a direct causal relationship, the parameters of which are determined through estimation
techniques. An example of this is the KMV model, which uses an option theoretic
formulation to explain default and then derives the form of the relationship through
estimation. Some of the systems have the disadvantage of being overly complex and
difficult to implement such as the KMV model which requires advanced expertise in
quantitative techniques to implement.
After an initial rating has been assigned to a borrower at the start of a credit relationship
continual monitoring of default likelihood should be done at least annually or as and when
significant information that affect the credit risk profile of the bank is received. The
likelihood of changes migration or transition across credit risk grades can be assessed
overtime using transition or migration matrices.
2.2.1 Transition Matrices
Migration probability matrices are data summaries that help to predict the tendency of a
credit to migrate to lower or higher credit quality based on historically observed migration
patterns. These matrices are derived by using the cohort component analysis, i.e. observing
a group of similar credit quality borrowers through time from inception to end of a credit
relationship. Thus transition matrices help to answer questions such as “with what
probability will the credit risk rating of a borrower decrease by a given degree?’ Consider a
rating system with two rating classes A and B, and a default category D. The transition
Page 9 of 63
matrix for this rating system is a table listing the probabilities that a borrower rated A at the
start of a period has rating A, B or D at the end of the period; analogously for B-rated
companies. The table below illustrates the transition matrix for this simple rating system:
Table 1: Transition Matrix
The length of period of observing transitions in credits is often set to one year, but other
choices are possible, such as three months. The default category does not have a row of its
own as it is treated as an absorbing category, i.e. probabilities of migrating from D to A and
B are set to zero. A borrower that moves from B to D and back to B within the period will
still be counted as a defaulter. If we counted such an instance as ‘stay within B’, the
transition matrix would understate the danger of experiencing losses from default. The
stability of the transition matrix is dependent on the rating philosophy of the bank.
2.2.2 Rating Philosophy
The first step when developing a conceptually sound credit risk rating framework is to
decide what the credit rating should indicate, thus the rating philosophy. It is very
important for banks to decide whether they want their internal rating systems to grade
borrowers according to their current condition (point-in-time, PIT) or their expected
condition over a cycle (through-the-cycle, TTC) because the rating philosophy influences
many aspects such as: credit approval, loan pricing, early warning of defaults, volatility and
procyclicality of regulatory and economic capital, and as a result the profitability of a bank
and its competitive position.
Banks whose ratings are used primarily for underwriting purposes are likely to implement
systems that are TTC. TTC ratings will tend to remain more-or-less constant as
macroeconomic conditions change over time. On the other hand, banks whose ratings are
A B D
AProbability of staying in A
Probability of migrating from A to B
Probability of default from A
BProbability of migrating from B to A
Probability of staying in B
Probability of default from B
Rating at Beginning of
Period
Rating at end of period
Page 10 of 63
used for pricing purposes or to track current portfolio risk are more likely to implement PIT
rating systems. PIT ratings will tend to adjust quickly to a changing economic environment.
Between these two extreme cases lie hybrid rating systems that embody characteristics of
both PIT and TTC rating philosophies. To effectively validate pooled PDs, supervisors and
risk managers will need to understand the rating philosophy applied by a bank in assigning
obligors to risk buckets.
2.3 Validation Techniques
The objective of validating a rating system is to assess whether a rating system can, and
ultimately does, fulfil its task of accurately distinguishing and measuring credit risk. There
are two dimensions along which ratings are commonly assessed, that is, discrimination and
calibration.
In checking discrimination, we assess how well a rating system ranks borrowers according
to their true probability of default (PD). When examining calibration we assess how well
the estimated PDs match realised PDs. The following example in Table 1 below illustrates
the two dimensions of rating quality assessment.
Table 2: Rating System Validation Dimensions
B1 A (1%) A2(2.01%) 1.50%
B2 B (5%) B2(2%) 2%
B3 C (20%) C2(1.99%) 2.50%
BorrowerRating of System 2 (PD)
Actual PDRating of System 1 (PD)
Page 11 of 63
From the table above we note that the rank ordering of rating system 1 is perfect, but the
PDs differ significantly from the true PDs. By contrast, the average PD of rating system 2
closely matches the average true PD, and individual deviations from the average PD are
small. However, it does not discriminate at all as the system’s PDs are inversely related to
the true PDs. There are various techniques that are used in practice to assess either the
discrimination or calibration power or both dimensions simultaneously for a rating system.
2.3.1 Provisions by BCBS
The advent of Basel II in 2004, linked credit risk regulatory capital requirements of banks
to banking institutions’ own internal estimates of credit risk parameters (PD, EAD and
LGD). The success of the revised accord’s credit risk advanced approaches was thus,
hinged on the continued accuracy and consistency of estimated parameters. To this end the
BCBS established the Subgroup on Validation called the Accord Implementation Group on
Validation (AIGV) in 2004. The objective of the AIGV is to share and exchange views
related to the validation of rating systems. This is aimed at ensuring that banks implement
robust and efficient validation mechanisms that enable the continued soundness of the
rating systems. In this research we will, however, only discuss validation of PDs. The
AIGV developed six important principles on validation that resulted in a broad framework
for validation. The principles were published in a BCBS news letter of January 2005 (see
BCBS, 2005, Blockwitz and Hohl, 2006). The validation framework covers all aspects of
validation, including the goal of validation (principle 1), the responsibility for validation
(principle 2), expectations on validation techniques (principles 3, 4, and 5), and the control
environment for validation (principle 6). The validation principles are outlined below.
Principle 1: Validation is fundamentally about assessing the predictive ability of a bank’s
risk estimates and the use of ratings in credit processes
Page 12 of 63
The two step process for rating systems requires banks to firstly discriminate adequately
between risky borrowers (i.e. being able to discriminate between risks and its associated
risk of loss) and secondly calibrate risk (i.e. being able to accurately quantify the level of
risk). The IRB parameters must, as always with statistical estimates, be based on historical
experience which should form the basis for the forward-looking quality of the IRB
parameters. IRB validation should encompass the processes for assigning those estimates
including the governance and control procedures in a bank.
Principle 2: The bank has primary responsibility for validation
Supervisors do not have the primary responsibility for validating bank rating systems.
Rather, a bank has the primary role, and consequently must validate its own rating systems
to demonstrate how it arrived at its risk estimates and confirm that its processes for
assigning risk estimates are likely to work as intended and continue to perform as expected.
Supervisors, on the other hand, should review the bank’s validation processes and
outcomes and may rely upon additional processes of its own design, or even those of third
parties, in order to have the required level of supervisory comfort or assurance.
Principle 3: Validation is an iterative process
Validation is an ongoing, iterative process in which banks and supervisors periodically
refine validation tools in response to changing market and operating conditions. Banks and
supervisors will need to engage in an iterative dialogue on strengths and weaknesses of
particular rating systems.
Principle 4: There is no single validation method
Many well-known validation tools like backtesting, benchmarking, replication, etc are a
useful supplement to the overall goal of achieving a sound IRB system. However, there is
unanimous agreement that there is no universal tool available, which could be used across
portfolios and across markets.
Principle 5: Validation should encompass both quantitative and qualitative elements
Page 13 of 63
While it might be possible to think of validation as a purely technical/mathematical
exercise in which outcomes are compared to estimates using statistical techniques, and
indeed in some circumstances such technical tools may play a critical role in such
assessments, it will likely be insufficient to focus solely on comparing predictions and
outcomes. In assessing the overall performance of a rating system, it is also important to
assess the components of the rating system (data, models, etc.) as well as the structures and
processes around the rating system. This should include an assessment of controls
(including independence), documentation, internal use, and other relevant qualitative
factors.
Principle 6: Validation processes and outcomes should be subject to independent review
It is important that a bank’s validation processes and results should be reviewed for
integrity by parties within the banking organisation that are independent of those
accountable for the design and implementation of the validation process. This independent
review is a process that may be accomplished using a variety of structural forms. The
activities of the review process may be distributed across multiple units or housed within
one unit, depending on the varying management and oversight frameworks of banks. As an
example, internal audit could be charged with undertaking this review process using
internal technical experts or third parties independent from those responsible for building
and validating the bank's rating system. Regardless of the bank's control structure, internal
audit has an oversight responsibility to ensure that validation processes are implemented as
designed and are effective.
According to the BCBS elaboration on the term “validation”, we consider three mutually
supporting ways to validate bank internal rating systems. This encompasses a range of
processes and activities that contribute to the overall assessment and final judgement. More
specifically, this can be directly related to the application of principle 4 and principle 5 of
the BCBS newsletter as discussed above.
Component-based validation: - analyses each of the three elements – data collection and
compilation, quantitative procedure and human influence – for appropriateness and
workability.
Page 14 of 63
Result-based validation (also known as backtesting): - analyses the rating system’s
quantification of credit risk ex post.
Process-based validation: - analyses the rating system’s interfaces with other processes in
the bank and how the rating system is integrated into the bank’s overall management
structure.
(a) Process-based Validation
Validating rating processes includes analysing the extent to which an internal rating system
is used in daily banking business. The use test and associated risk estimates is one of the
key requirements in the Basel II framework. There are two different levels of validation.
Firstly, the plausibility of the actual rating in itself, and secondly, the integration of ratings
output in the operational procedure and interaction with other processes, this incorporates
the following:
(i) Understanding the rating system: It is fundamental to both types of analysis that
employees understand the rating methodology used. The learning process should
not be restricted to loan officers. As mentioned above, it should also include those
employees who are involved in the rating process. In-house training courses and
other training measures are required to ensure that the process operates properly.
(ii) Importance for management: Adequate corporate governance is crucial for banks.
In the case of a rating system, this requires the responsibility of executive
management and to a certain extent the supervisory board, for authorising the
rating methods and their implementation in the bank’s day-to-day business. We
would expect different rating methods to be used depending on the size of the
borrower, and taking account of the borrowers’ different risk content and the
relevance of the incoming information following the decision by senior
management.
(iii) Internal monitoring processes: The monitoring process must cover at least the
extent and the type of rating system used. In particular, it should be possible to rate
all borrowers in the system, with the final rating allocated before credit is granted.
If the rating is given after credit has been granted, this raises doubts about the
Page 15 of 63
usefulness of internal rating. The same applies to a rating which is not subject to a
regular check. There should be a check at least annually and whenever new
information about the debtor is received which casts doubt on their ability to clear
their debts. The stability of the rating method over time, balanced with the need to
update the method as appropriate, is a key part of the validation. To do this, it is
necessary to show that objective criteria are incorporated so as to lay down the
conditions for a re-estimation of the quantitative rating model or to determine
whether a new rating model should be established.
(iv) Integration in the bank’s financial management structure: Unless meaningful credit
risk data is recorded for each borrower, it is impossible to perform the proper risk
pricing of loans taking into account inherent credit risk. If this is to be part of the
loan pricing process, a relationship must be determined between the individual
rating categories and the standard risk costs. However, it must be borne in mind
that the probability of default is simply a component of the calculation of the
standard risk costs and, similarly to the credit risk models, other risk parameters,
such as the LGD and EAD should also be recorded. Ultimately the gross margin on
a loan, which approximates to the difference between lending rates and refinancing
costs, can act as a yardstick for including the standard risk costs.
(b) Result-based Validation
As we have stated before that the major use of rating systems is that of forecasting
likelihood of default based on currently available information. Rating systems such as the
ones discussed above may be seen as classification tools in the sense that they provide
indications of the obligor’s likely future status. The procedure of applying a classification
tool to an obligor for an assessment of her or his future status is commonly called
discrimination.
The main construction principle of rating systems can be described as “the better a grade,
the smaller the proportion of defaulters and the greater the proportion of non-defaulters that
are assigned this grade.” Consequently, a rating system will discriminate the better, the
more the defaulters’ distribution on the grades and the non-defaulters’ distribution on the
grades differ.
Page 16 of 63
The discriminatory power of a rating system, thus denotes its ability to discriminate ex ante
between defaulting and non-defaulting borrowers. The discriminatory power can be
assessed using a number of statistical measures of discrimination, some of which are
described in this section. However, the absolute measure of the discriminatory power of a
rating system is only of limited meaningfulness. A direct comparison of different rating
systems, for example, can only be performed if statistical “noise” is taken into account. In
general, the noise will be a greater issue, the smaller the size of the available default
sample. For this reason, statistical tools for the comparison of rating systems are also
presented. Some of the tools, in particular the Accuracy Ratio and the Receiver Operating
Characteristic, explicitly take into account the size of the default sample. Moreover, the
discriminatory power should be tested not only in the development dataset but also in an
independent dataset (out-of-sample validation). Otherwise there is a danger that the
discriminatory power may be overstated by over-fitting to the development dataset. In this
case the rating system will frequently exhibit a relatively low discriminatory power on
datasets that are independent of, but structurally similar to, the development dataset. Hence
the rating system would have a low stability. A characteristic feature of a stable rating
system is that it adequately models the causal relation between risk factors and
creditworthiness. It avoids spurious dependencies derived from empirical correlations. In
contrast to stable systems, unstable systems frequently show a sharply declining level of
forecasting accuracy over time.
In practice, rating systems are just used in credit granting decisions in some developing
economies with a high number of overrides. In some of the developing countries as well as
developed economies they form the basis for pricing credits and calculating risk premiums
and capital charges. For these purposes, each rating grade or score value must be associated
with a PD that gives a quantitative assessment of the likelihood with which obligors graded
this way will default. Additionally, under both IRB approaches and the Reserve Bank of
Zimbabwe credit risk modified approach, a bank’s capital requirements are determined by
internal estimates of the risk parameters for each exposure. These are derived in turn from
the bank’s internal rating scores. The set of parameters includes the borrower’s credit risk
grade and Probability of Default.
The other dimension in the validation of rating systems is to validate the calibration of the
Page 17 of 63
rating system. As the risk parameters can be determined by the bank itself, the quality of
the calibration is an important prudential criterion for assessing rating systems.
Checking discriminatory power and checking calibration are different tasks. As the ability
of discrimination depends on the difference of the defaulters’ and non-defaulters’
respective distributions on the rating grades, some measures of discriminatory power
summarise the differences of the probability densities of these distributions. Alternatively,
the variation of the default probabilities that are assigned to the grades can be measured. In
contrast, correct calibration of a rating system means that the PD estimates are accurate.
Hence, when examining calibration the differences of forecast PDs and realised default
rates must be considered. This can be done simultaneously for all rating grades in a joint
test or separately for each rating grade, depending on whether an overall assessment or an
in-detail examination is intended.
At first glance, the validation of calibration power of rating systems appears to be a similar
problem to the back-testing of internal models for market risk. For market risk, add-ons to
the capital requirements can be immediately derived from the results of the back-testing
procedure. Past years’ experience gives evidence of the adequacy of this methodology.
However, the availability of historical data for market risk is much better than for credit
risk. As market risk has to be measured on a daily basis (leading to samples of size 250 in a
year), the (typically yearly) time intervals for credit risk are much longer due to the
rareness of credit events. For credit portfolios, ten data points of yearly default rates are
regarded as a long time series and the current Basel II proposals consider five year series as
sufficient. As a result, the reliability of estimates for credit risk parameters is not at all
comparable to the reliability of back-tests of internal market risk models.
Moreover, whereas in the case of market risk there is no strong evidence that the
assumption of independence (over time) observations would be violated, the analogous
assumption seems to be questionable for credit losses. This holds even more for cross-
sectional credit events (i.e. in the same year). As a consequence, standard independence-
based tests of discriminatory power and calibration are likely to be biased when applied to
credit portfolios.
Page 18 of 63
The choice of a specific technique to be applied for validation should depend upon the
nature of the portfolio under consideration. Retail portfolios or portfolios of small- and
medium-sized enterprises with large records of default data are much easier to explore with
statistical methods than, for example, portfolios of sovereigns or financial institutions
where default data are sparse.
2.3.2 Assessing Discriminatory Power
There are various statistical methodologies for the assessment of discriminatory power (see
BCBS, 2005 and Tasche, 2009).
(a) Cumulative Accuracy Profile (CAP)
The Cumulative Accuracy Profile is also known as the Gini curve, Power curve or Lorenz
curve. It is a visual tool whose graph can easily be drawn if two representative samples of
scores for defaulted and non-defaulted borrowers are available. Concavity of the CAP is
equivalent to the property that the conditional probabilities of default given the underlying
scores form a decreasing function of the scores. Moreover, non-concavity indicates sub-
optimal use of information in the specification of the score function. The most common
summary index of the CAP is the Accuracy Ratio (or Gini coefficient). The shape of the
CAP depends on the proportion of defaulted and non-defaulted borrowers in the sample.
Hence a visual comparison of CAPs across different portfolios may be misleading.
Practical experience shows that the Accuracy Ratio (AR) has tendency to take values in the
range of 50% and 80%. However, such observations should be interpreted with care as they
seem to strongly depend on the composition of the portfolio and the numbers of defaulters
in the samples. Suppose we have an arbitrary rating model that produces obligor rating
scores. A high rating score is usually an indicator of a low default probability. To obtain
the CAP curve, all obligors are first ordered by their respective scores, from riskiest to
safest, i.e. from the borrower with the lowest score to the obligor with the highest score.
For a given fraction x of the total number of obligors the CAP curve is constructed by
calculating the percentage d(x) of the defaulters whose rating scores are equal to or lower
than the maximum score of fraction x. This is done for x ranging from 0% to 100%. Figure
1 illustrates CAP curves.
Page 19 of 63
Figure 1: Cumulative Accuracy Profiles (Source BCBS Validation Group)
A perfect rating model will assign the lowest scores to the defaulters. For a random model
without any discriminative power, the fraction x of all obligors with the lowest rating
scores will contain x percent of all defaulters. Real rating systems will be somewhere in
between these two extremes. The quality of a rating system is measured by the Accuracy
Ratio AR. It is defined as the ratio of the area between the CAP of the rating model
being validated and the CAP of the random model, and the area between the CAP of the
perfect rating model and the CAP of the random model, given mathematically as:
(b) Receiver Operating Characteristic (ROC)
Like the CAP, the Receiver Operating Characteristic (ROC) is a visual tool that can be
easily constructed if two representative samples of scores for defaulted and non-defaulted
borrowers are available. The construction is slightly more complex than for CAPs but, in
contrast, does not require the sample composition to reflect the true proportion of defaulters
and non-defaulters. As with the CAP, concavity of the ROC is equivalent to the conditional
probabilities of default being a decreasing function of the underlying scores or ratings and
non-concavity indicates sub-optimal use of information in the specification of the score
Page 20 of 63
function (see Eglemann et al, 2003). One of the summary indices of ROC, the ROC
measure (or Area Under the Curve, AUC), is a linear transformation of the Accuracy Ratio
mentioned above. The statistical properties of the ROC measure are well-known as it
coincides with the Mann-Whitney statistic. In particular, powerful tests are available for
comparing the ROC measure of a rating system with that of a random rating and for
comparing two or more rating systems. Also, confidence intervals for the ROC measure
can be estimated with readily available statistical software packages. By inspection of the
formulas for the intervals, it turns out that the widths of the confidence intervals are mainly
driven by the number of defaulters in the sample. The more defaulters are recorded in the
sample, the narrower the interval.
The Pietra index is another important summary index of ROCs. Whereas the AUC
measures the area under the ROC, the Pietra index reflects half the maximal distance of the
ROC and the diagonal in the unit square (which is just the ROC of rating systems without
any discriminatory power). As is the case with the ROC measure, the Pietra index also has
an interpretation in terms of a well-known test statistic, the Kolmogorov-Smirnov statistic.
As with the ROC measure, a test for checking the dissimilarity of a rating and the random
rating is included in almost all standard statistical software packages.
Both the ROC measure and the Pietra index do not depend on the total portfolio probability
of default. Therefore, they may be estimated on samples with non-representative
default/non-default proportions. Similarly, figures for bank portfolios with different
fractions of defaulters may be directly compared.
For both these indices, it is not possible to define in a meaningful way a general minimum
value in order to decide if a rating system has enough discriminatory power. However, both
indices are still useful indicators for the quality of a rating system.
Significance of rejection of the null hypothesis (rating system has no more power than the
random rating) with the Mann-Whitney or Kolmogorov-Smirnov tests at a (say) 5% level
could serve as a minimum requirement for rating systems. This would take care of
statistical aspects like sample size. Lower p-values with these tests are indicators of
superior discriminatory power. However, for most rating systems used in the banking
Page 21 of 63
industry, p-values will be nearly indistinguishable from zero. As a consequence, the
applicability of the p-value as an indicator of rating quality appears to be limited.
The construction of a ROC curve is illustrated in the Figure below which shows possible
distributions of rating scores for defaulting and non-defaulting obligors. For a perfect rating
model the left distribution and the right distribution in the Figure would be separate. For
real rating systems, perfect discrimination in general is not possible. Both distributions will
overlap as illustrated below.
Figure 2: Distribution of Defaulters and Non-defaulters
(Source BCBS Validation Group)
Assume someone has to find out from the rating scores which obligors will survive during
the next period and which obligors will default. One possibility for the decision-maker
would be to introduce a cut-off value C as in Figure 2, and to classify each debtor with a
rating score lower than C as a potential defaulter and each debtor with a rating score higher
than C as a non-defaulter. Then four decision results would be possible. If the rating score
is below the cut-off value C and the debtor defaults subsequently, the decision was correct.
Otherwise the decision-maker wrongly classified a non-defaulter as a defaulter. If the rating
score is above the cut-off value and the debtor does not default, the classification was
correct. Otherwise a defaulter was incorrectly assigned to the non-defaulters’ group.
Rauhmeier, 2005 and Engelmann et al, 2003, show that the AUC and the AR both possess
the same information. Thus we shall focus only on one of the methodologies which is the
CAP.
Page 22 of 63
(c) Entropy Measures
Entropy is a concept from information theory that is related to the extent of uncertainty that
is eliminated by an experiment. The observation of an obligor over time in order to decide
about her or his default status may be interpreted as such an experiment. The uncertainty of
the default status is highest if the applied rating system has not at all any discriminatory
power or, equivalently, all the rating grades have the same PD. In this situation, the entropy
concept applied to the PDs of the rating system would yield high figures since the gain in
information by finally observing the obligor’s status would be large. Minimisation of
entropy measures like Conditional Entropy, Kullback-Leibler distance, and information
value is therefore a widespread criterion for constructing rating systems or score functions
with high discriminatory power. However, these measures appear to be of limited use only
for validation purposes as no generally applicable statistical tests for comparisons are
available.
The Brier score is a sample estimator of the mean squared difference of the default
indicator variables (i.e. one in case of default and zero in case of survival) in a portfolio and
the default probability forecasts for rating categories or score values. In particular, the Brier
score does not directly measure the difference of the default probability forecast and the
true conditional probability of default given the scores. Therefore, the Brier score is not a
measure of calibration accuracy. Rather, the Brier score should be interpreted as the
residual sum of squares that result from a non-linear regression of the default indicators on
the rating or score function. As a consequence, minimising the Brier score is equivalent to
maximising the variance of the default probability forecasts (weighted with the frequencies
of the rating categories). Empirical results indicate that maximising the ROC measure
entails maximisation of this variance. In this sense, the Brier score is a measure of
discriminatory power and could be used in this sense as a part of an optimisation criterion.
The BCBS Validation Group exclaimed that the Accuracy Ratio (AR) and the ROC
measure are more meaningful than the other above-mentioned indices because of their
statistical properties. For both summary statistics, it is possible to calculate confidence
intervals in a simple way. The width of the confidence will depend on the particular
Page 23 of 63
portfolio under consideration and on the number of defaulted obligors that is available for
the purpose of estimation. As a general rule, the width of the confidence interval for AR (or
the ROC measure) will be the larger, and hence the quality of the estimate will be the
worse, the smaller is the number of observed defaults. Consequently, these tools reflect
both the quality of a rating system and the size of the samples that the rating system is built
on. Therefore, they are helpful in identifying rating systems which require closer inspection
by a supervisor. In particular, supervisors can reliably test if a rating model is significantly
different from a model with no discriminatory power. The Brier score can be useful in the
process of developing a rating system as it also indicates which of any two rating systems
has the higher discriminatory power. However, due to the lack of statistical test procedures
applicable to the Brier score, the usefulness of this metric for validation purposes is limited.
If not enough default observations for the development of a rating or score system are
available, the construction of a shadow rating system could be considered. A shadow rating
is intended to duplicate an external rating but can be applied to obligors for which the
external rating is not available. Shadow ratings can be built when the available database
contains accounting information of enough externally rated obligors. Default probabilities
for the shadow rating will then be derived from statistics for the external rating. On
samples of borrowers for which both the shadow and the external rating are available, the
degree of concordance of the two rating systems can be measured with two rank-order
statistics, Kendall’s τ and Somers’ D. Somers’ D is a conditional version of Kendall’s τ that
coincides with the Accuracy Ratio in the case of a rating system with only two categories.
For both these metrics, tests can be performed and confidence intervals can be calculated
with some standard statistical software packages. In the case of high concordance of the
shadow rating and the external rating, the shadow rating will inherit the discriminatory
power and the calibration quality of the external rating if the portfolio under consideration
and the rating agency’s portfolio have a similar structure.
2.3.3 Calibration
Validation of the calibration of a rating system is more difficult than validation of its
discriminatory power. When considering the statistical tools it is important to note that
Page 24 of 63
there are several established statistical methods for deriving PDs (Probabilities of Default)
from a rating system. First, a distinction needs to be drawn between direct and indirect
methods. In the case of the direct methods, such as Logit, Probit and Hazard Rate models,
the rating score itself can be taken as the borrower’s PD. The PD of a given rating grade is
then normally calculated as the mean of the PDs of the individual borrowers assigned to
each grade. Where the rating score cannot be taken as the PD (as in the case of discriminant
analysis), indirect methods can be used. One simple method consists of estimating the PD
for each rating grade from historical default rates. Another method is the estimation of the
score distributions of defaulting borrowers, on the one hand, and non-defaulting borrowers,
on the other. A specific PD can subsequently be assigned to each borrower using Bayes’
Formula.
In practice, a bank’s PD estimates will differ from the default rates that are observed
afterwards. The key question is whether the deviations are purely random or whether they
occur systematically. A systematic underestimation of PDs merits a critical assessment,
from the point of view of supervisors and bankers alike, since in this case the bank’s
computed capital requirement would not be adequate to the risk it has incurred.
When independence of default events is assumed in a homogeneous portfolio, the binomial
test (most powerful among all tests at fixed level) can be applied in order to test the
correctness of a one period default probability forecast. It is known in literature that the
true type I error (i.e. the probability to reject erroneously the hypothesis of an adequate PD
forecast) can be much larger than the nominal level of the test if default events are
correlated. Efforts to take into account dependence in the binomial test, for example, by
incorporating a one factor dependence structure and Gordy’s granularity adjustment in
order to adjust for the finiteness of the sample, yield tests of rather moderate power, even
for low levels of correlation. The binomial test can be applied to one rating category at a
time only. If (say) twenty categories are tested, at 5% significance level one erroneous
rejection of the null hypothesis “correct forecast” has to be expected. This problem can be
circumvented by applying the chi-square (or Hosmer-Lemeshow) test to check several
rating categories simultaneously. This test is based on the assumption of independence and
a normal approximation. Due to the dependence of default events that is observed in
practice and the generally low frequency of default events, the chi-square test is also likely
Page 25 of 63
to underestimate the true Type I error.
The normal test is an approach to deal with the dependence problem that occurs in the case
of the binomial and chi-square tests. The normal test is a multi-period test of correctness of
a default probability forecast for a single rating category. It is applied under the assumption
that the mean default rate does not vary too much over time and that default events in
different years are independent. The normal test is motivated by the Central Limit Theorem
and is based on a normal approximation of the distribution of the time-averaged default
rates. Cross-sectional dependence is admissible. Simulation studies show that the quality of
the normal approximation is moderate but exhibits a conservative bias. As a consequence,
the true Type I error tends to be lower than the nominal level of the test, i.e. the proportion
of erroneous rejections of PD forecasts will be smaller than might be expected from the
formal confidence level of the test. The test seems even to be, to a certain degree, robust
against a violation of the assumption that defaults are independent over time. However, the
power of the test is moderate, in particular for short time series (for example five years).
For the supervisory evaluation of internal market risk models, the so-called traffic lights
approach has proved to be a valuable instrument. This approach was introduced with the
1996 Marker Risk Amendment. Banks use their internal market risk models in order to
forecast a certain amount of losses (Value-at-Risk) that will not be exceeded by the realised
losses with a high probability of 99%. Depending on the number of observed exceedances,
the so-called multiplication factor that is applied to the Value-at-Risk estimate is increased.
There is a green zone of exceedances where no increment to the multiplication factor is
necessary. In the yellow zone, the increment is effectively proportional to the number of
exceedances, whereas in the red zone the maximum value for the increment has to be
applied.
The concept of a traffic lights approach can be transferred to the validation of PD
estimates. However, it is unlikely that direct consequences for the capital requirements of a
bank can be derived from this approach. A recently proposed version of a traffic lights
approach is, in contrast to the normal test, completely independent of any assumption of
constant or nearly constant PDs over time. It can be considered as a multi-period back-
testing tool for a single rating category that is based on the assumption of cross-sectional
Page 26 of 63
and inter-temporal independence of default events. The distribution of the number of
defaults in one year is approximated with a normal distribution. Based on the quartiles of
this normal distribution, the number of defaults is mapped to one of the four traffic light
colours: green, yellow, orange, and red. This mapping results in a multinomial distribution
of the numbers of colours when observed over time. Inference on the adequacy of default
probability forecasts this way becomes feasible. By construction of the tool with a normal
approximation that neglects potential cross-sectional and inter-temporal correlations, higher
than expected frequencies of type I errors (i.e. erroneous rejections of default probability
forecasts) may occur. As a consequence, this traffic lights approach is conservative in the
sense of yielding relatively more false alerts than not detecting bad calibrations. Simulation
results indicate that the traffic lights approach is not too conservative since the frequency of
false alerts can be kept under control. Furthermore the simulation study suggests that the
type II errors (i.e. the probabilities of accepting biased estimates as correct) are not higher
than those of the normal test.
It is worth noting that at present no really powerful tests of adequate calibration are
currently available. Due to the correlation effects that have to be respected there even
seems to be no way to develop such tests. Existing tests are rather conservative – such as
the binomial test and the chi-square test – or will only detect the most obvious cases of
miscalibration as in the case of the normal test. As long as validation of default
probabilities per rating category is required, the traffic lights testing procedure appears to
be a promising tool because it can be applied in nearly every situation that might occur in
practice. Nevertheless, it should be emphasised that there is no methodology to fit all
situations that might occur in the validation process. Depending on the specific
circumstances, the composition of a mixture of different techniques will be the most
appropriate way to tackle the validation exercise.
2.3.4 Benchmarking
In the context of validation, benchmarking can be defined as a comparison of internal
ratings and estimates with externally observable (whether public or non public)
information. For banking institution’s internal rating systems, an example of public
Page 27 of 63
benchmarks includes the ratings given by ratings agencies such as S&P or Moody’s (see
Sobehart et al, 2000). In the case of the major rating agencies, detailed information about
the firms they rate is usually available, which makes testing and analysis of their rating
systems more feasible. Other examples of public but harder-to-analyse benchmarks include
Moody’s KMV EDFs, which are usually well disclosed but difficult to test as the
technology used is proprietary. Non-public benchmarks are typically supervisory
benchmarks that are usually not disclosed.
The most straightforward benchmarking is usually carried out for PD estimates because
they are obligor-specific and therefore it is relatively easy to define a set of borrowers
which are benchmarked. Generally, PDs are expressed on a universally understood zero-to-
one interval scale. It is more difficult to use benchmarking for EAD and LGD estimates
because they are exposure-specific, though such practice is growing. Benchmarking of the
ratings that often underlie estimates is even more difficult because of the need to map
different rating scales to a common scale.
As stated above our discussion, shall focus on PD validation, and thus regarding
benchmarking will regard PD and rating systems benchmarking. The notion of
benchmarking is refers to the mapping of internal ratings to an external rating system.
(a) Objectives of benchmarking
In a paper by the BCBS on Studies on the Validation of Internal Rating Systems, the
validation group consented that there is no single unambiguous and complete statistical test
developed that enabled validations of all the facets of an internal rating system. Difficulties
mainly relate to the effect of default correlation, data constraints, and the definition of
meaningful and robust target criteria for validating rating systems. In this respect,
benchmarking is often viewed as a complement to formal statistical backtesting of internal
rating systems. As a matter of fact, benchmarking does appear in many aspects to be part of
the whole process of producing internally generated estimates in banks’ internal systems.
For example, banks frequently use external and independent references to calibrate their
own rating systems in terms of PD. However, a bank’s internal rating should reflect its
internal risk management practices, and should not be a mere replication of an external
benchmark model.
Page 28 of 63
In principle, it is possible to differentiate between two ways of carrying out benchmarking
for a certain set of borrowers or exposures:
(i) Comparison of the internal estimates of risk components (e.g. PD) across a panel. For
example, banks or supervisors may wish to compare PD estimates on corporates with
respect to a peer group. The main purpose is to assess the correlation of the estimates or
conversely the identification of potential “outliers” (this can be done using variance
analysis or robust regression) but not to determine if these estimates are accurate or not.
(ii) Comparison of internal estimates with an external and independent benchmark, for
example, a rating provided by a supervisory authority or rating agency. Here the
external benchmark is implicitly given a special credibility, and deviations from this
benchmark provide a reason to review the internal estimates. In this approach, the
benchmark is used to calibrate and/or validate internal estimates. Given difficulties with
identifying absolute benchmarks, one should be critical when using benchmarks.
In either case, benchmarking appears to be part of validation but may to some extent be
more flexible, as it allows banks and supervisors to decide what benchmark is most
appropriate and to enforce decision rules on how the IRB system should behave. In this
respect, benchmarking replaces a purely formal validation process (which is not always
available) with a more empirical and operational approach.
The first approach is of particular interest to supervisors and can be pursued in any
jurisdiction where banks adopt use internal rating systems for capital or as part of their
major processes. This can be done by obtaining PD estimates for a common set of
borrowers from different banks. However, although simple to implement, this approach
raises several difficulties.
(i) A major technical problem is often the identification of common borrowers across
banks. This can be alternatively viewed as the construction of a peer group. Depending
on the information sources available, tax codes, identification codes from public or
private credit registers or manual selection may solve this technical problem.
(ii) Once these common borrowers have been identified, comparing the different ratings
across banks would require their mapping to a master scale. This issue is similar to
Page 29 of 63
mapping to a benchmark as explained below.
(iii) Once PDs for a peer group have been collected, benchmarking may support further
analysis: a widely used approach is to use benchmarking to identify outliers. In this
respect, benchmarking can also be viewed as part of non parametric tests to detect
potential and systematic bias in a bank’s methodology. As a matter of fact, identifying
outliers may prove difficult in effect, as differences in estimates may just stem from
differences in methodologies. For example, PD estimates may differ because of a
different definition of default. Thus, benchmarking might be regarded more as a
variance analysis which can still be useful as it provides a qualitative indicator of
potential differences in techniques used within the peer group. Such differences need
of course to be further analysed and their impact on the banking institutions and the
system as a whole must be understood.
Pursuing the second approach, involving an external benchmark, raises two major
concerns:
(i) The selection of the benchmark: selecting an appropriate benchmark may not be such an
obvious exercise as it requires some prior knowledge or some inference of the features
of the underlying model. Choosing a benchmark PD, for example, depends upon
whether the PD analysed is stressed or unstressed, and dynamic or static.
(ii) The mapping to the benchmark: the mapping refers to the one-to-one relation that can
be inferred between the unobserved model and its benchmark. Ideally, in the case of a
perfectly matching benchmark, this relationship will be perfectly one-to-one, but this
will not be true the general case. As such, formalising this relationship may be quite
difficult.
As a matter of fact, whether used with a relative or absolute objective, a comparison with
an external benchmark may in practice still appear to be rather subjective regarding the last
two aspects, i.e. what benchmark to use and how to use it, the difficulty reflecting the fact
that both issues are related. Most of the benchmarks used are, for example, PDs from rating
agencies or commercial vendor models, often regardless of their true adequacy. As a
consequence, mapping procedures may often be simple and misleading.
Page 30 of 63
Yet, benefits of using benchmarking as a complement to validation could be greater if a
more objective approach to constructing decision rules for banks and supervisors was used.
Moreover, if benchmarks and mapping methodologies were available, validation could be
less costly and burdensome. Supervisory evaluation could then focus on assessing the
quality of the benchmark and the quality of the mapping. In addition, it could allow useful
inference on the underlying characteristics of the IRB system with respect to the
benchmark. The following sections look at the need to formalise the selection of
benchmarks and the mapping procedure.
Moreover, there is no clear distinction between the range of PD values that a grade spans
and the fact that the distribution of PD values of obligors assigned to the grade may not be
uniform. For example, even though a grade might be properly defined to span the PD
interval [.01,.02), and the average PD estimated for the grade might be .015, the true
average PD for the obligors assigned to the grade at any given time might be higher or
lower depending on the distribution of individual obligor PDs. Moreover, changes in the
distribution over time might change the true average PD.
(b) Selection of Benchmarks
In practice, credit risk modeling in banks follows a rather “bottom up” approach.
Notwithstanding reasons related to the empirical calibration of models, the segmentation of
credits by portfolios (bank, sovereign, corporate, etc.) is often justified by an operational
reality; banks develop different risk management and commercial techniques depending on
the business activity. This segmentation actually reflects the fact that credit risk is governed
by economic factors which are specific to the type of portfolio.
In this respect, the segmentation of portfolios according to specific economic
characteristics entails that the underlying default characteristics are also different. This
means for example, that the factors governing risks, say on banks, are not necessarily the
same, or do not necessarily follow the same dynamics as those governing, say corporates.
It appears then that in a bottom-up approach, which is most likely to be used by banks, a
portfolio’s segmentation would necessitate specific default models which are deemed to be
quite different. This observation has two major implications:
Page 31 of 63
(i) With respect to the selection of benchmarks, considering ratings benchmarking as a
mapping to an external rating system entails that the selection of an appropriate
benchmark would rest upon the assessment of its qualities in adequately representing
the expected economic characteristics of the portfolio studied. For example, many
banks use rating agencies grades as benchmarks not only for their corporate portfolios
but also more extensively for their SME and SME retail portfolio. It is worth noting
that selecting a benchmark is dependent on the relationship of the portfolio with the
general economic environment. One should question whether benchmarking PD
estimates of SME portfolios on say, S&P ratings, is consistent (in terms of granularity,
calibration, discriminative power, etc.). For example, some banks would use the same
benchmark (e.g. S&P), to classify risks for their corporate, SME and SME retail
portfolios. One may therefore question whether the granularity of a rating system
benchmarked on S&P ratings (about 20 classes) is consistent for describing SME and
retail SME risks. It appears likely that this granularity would be too excessive for SME
risks, thus entailing the possibility of non-significant or non-discriminative risk
buckets. Conversely, the granularity is expected to be higher for retail SME, thus
entailing a likely too low discriminative power.
(ii) With respect to the aggregation on a master scale, an issue may also arise on the
consistency of the aggregation of specific underlying default models on a master
default model, on the one hand, and eventually the consistency of the master default
model obtained.
Overall, inconsistencies in the rating criteria, dynamic properties, and granularity of two
rating systems make benchmarking exercises involving such disparate systems
operationally and conceptually difficult, which reduces the value of such exercises. Thus,
consistency in benchmarks is desirable. Benchmarking generally requires a mapping
procedure, i.e. rules relating unambiguously one rating system to the other, to be defined.
Unfortunately, mapping procedures often appear to be rather simple or crude. Most of the
time, this mapping rests on empirical comparisons of average PDs as a basis for grouping
and matching risk buckets on a master scale. A distinction should be made between the
range of PD values that a grade spans and the fact that the distribution of PD values of
Page 32 of 63
obligors assigned to the grade may not necessarily be uniform.
While simple to implement, such an approach may not seem satisfactory on theoretical
grounds and with respect to validation purposes for two reasons. First, the PDs compared
are not necessarily homogenous. They depend on the definition of default used, the rating
philosophy (TTC or PIT), and the conditioning (stressed, unstressed). These properties are
linked to the underlying default model and would need to be inferred in the first place as
suggested by the need to classify rating systems according to their dynamic properties.
Second, even in the case where the PDs compared are homogenous, this approach does not
take into account the granularities of each rating system (see above) which are proxies of
the true distribution of values of the underlying default model. The problem stems from the
fact that the distribution of obligors on a bucket is not observed, only an average PD.
Merging buckets on the ground of average PD implies an assumption that merging the
corresponding distribution does not alter the resulting average PD. This may be true, but is
not in the general case. Regarding PD, the problem of mapping could be considered as
optimising the granularity of the master scale in order to minimise the loss of information.
With respect to validation, special attention should therefore be given to the appropriate
granularity of the master scale and benchmark used. As mentioned before, this would need
some inference of the economic characteristics of the benchmark or master model.
For example, consider a bank’s portfolio specialised in corporate and SME lending.
Assume the risks on the SME portfolio are discriminated using a 10-grade rating scale,
while the risks on the corporate portfolio are discriminated using a 20-grade rating scale.
On both portfolios, the bank uses specific default models to derive its PD estimates. In
theory if the bank wishes to know its total risks on the whole portfolio, it would need to
somehow aggregate the two sub portfolios. One way to do this is to calculate separately
capital requirements on each sub portfolio and to aggregate them. This approach raises
consistency issues between models (see below). A more usual approach is to build a master
scale and its corresponding master default model. If the master scale has only 15 grades,
then information will be lost on the risk distribution of the underlying corporate risks. If 20
grades are used, then non significant or redundant risk buckets may be added for the
description of the underlying SME risks. Overall the discriminative power of the master
Page 33 of 63
scale is likely to be affected. In the figure below we illustrate possible scenarios that can
result benchmarking a rating system. Suppose we plot the distribution of the internal and
external grades on the same axis from less risky to more risky. The figure below shows
possible trends that can arise from such an exercise.
Figure 3: Plot of Internal vs External Rating System
If the internal rating system tracks the benchmark consistently the dispersion of the data
points from a common trend line with negative gradient should be marginal. The greater
the dispersion the more the disagreement between the rating systems and from the
underlying assumption that the benchmark is accurate it implies that the internal rating
system is inaccurate.
If the plot shows two parallel trend lines as shown in Trend 2 above then either the internal
rating system is conservative or generous depending on the positioning of the internal
rating system trend line relative the benchmark trend line. If the bench mark is above the
Page 34 of 63
internal rating scale then the internal rating scale is conservative and vice versa.
If the trend line of the bench mark and internal rating system intersect then it implies that
the internal rating system is conservative on a continuum of risk levels while generous on
others.
In this chapter we were able to review and discuss tools and techniques on rating systems
and their validation. In the next chapter we present the methodology that we are going to
use in facilitating the assessment and validation of rating systems.
Page 35 of 63
Chapter3:Methodology
3.1 Introduction
We use obligor internal ratings data from two banking institutions to undertake validation
of their internal ratings and in turn credit rating system. The banks operate in Zimbabwe
and the banks’ rating systems and parameters were calibrated over time. We shall reference
the two banks as Bank 1 and Bank 2. Below we give a description of the ratings data
constructed from the internal rating systems of both banks before explaining the
methodology we will employ in validation.
3.2 Data Description
3.2.1 Bank 1
Bank 1 has an internal rating system that has five rating grades and an associated
Probability of Default (PD) master scale. Four grades are non-default grades while one is a
default grade. It is worth noting that the rating system’s number of rating grades is below
the recommended number of a minimum of seven non-defaulting grades and one default
grade. This is based on the provisions of paragraph 403 of the Basel II document. BCBS
justifies the provision by exclaiming that this ensures that there is no concentration of
obligors in some rating grades. Lack of granularity on a rating system thus causes
concentration of obligors in some grades and in turn may compromise the discriminatory
power of the rating system.
The rating system was calibrated using the bank’s credit data spanning five years. The
calibration of PDs using five years data is in line with Basel II provisions of paragraph 264
which requires a data history of at least five years.
Bank 1 classifies its obligors into the A, B, C, D, and E grades, where grade E is a default
grade.
The bank has created a dummy rating class that it terms E2. These are accounts that exhibit
structural weaknesses and need close attention, the bank prudentially classify this grade as
E for reporting purposes. E2 contains obligors that have not defaulted by definition, but are
perceived by the bank to default in the near future with certainty.
Page 36 of 63
The scorecard used by the bank considers three factors, namely, financial factors, industry
specific factors and obligor specific factors such as corporate governance and management
structures.
The financial factors considered are:
(i) capacity to service facility if extended. This is inferred using the obligor’s
business financial ratios such as the Return on Capital (ROC), interest coverage
ratio, and the solvency ratio;
(ii) ability to service facilities when the business cycle is on the down turn. This is
inferred using the debt to equity ratio and the current ratio. Generally the factors
considered assess the entity’s correlation with the business cycle;
(iii) counterparty’s ability to raise additional funding. This is assessed by the quality
of management and ownership including if the entity has access to public funds
through the stock market; and
(iv) historical performance of the institution. To analyse this, the bank uses at least
three years data of the historical performance of the entity. The bank insists on
audited financials but in the absence of such it uses unaudited financials
conservatively.
The bank uses the same scorecard for its SME retail portfolio just as for corporations. The
use of the same scorecard for the SME retail portfolio and its corporate portfolio may
distort obligor migrations across grades as these portfolios have different idiosyncratic as
well as systemic factors that affect their performance.
(a) Portfolio Distribution
The bank’s distribution of ratings was as given in the table below:
Page 37 of 63
Figure 4: Ratings Distribution by Class
(b) Ratings Classification
The grade E is defined as the default grade according to the Basel II definition as it also
contains obligors that are 90 days past due. The crude default rates in Table 3 below show
that there were 25 defaults in the portfolio, all from A and D classes.
Table 3: Crude Default Rates
As was discussed above the few rating grades have resulted in concentration of ratings in
the first grades. Rating class A contains 75.28% of total non-defaulted obligors in the
portfolio while rating class B contains 21.21% of total non-defaulted obligors. Such a
scenario is synonymous with rating system that lack granularity. The concentration of
obligors in the first class poses a danger of concentration of defaults in the first class as
reflected above.
Rating Class Total in Class Expected No. of Defaults/ year
Default Rates (Annualised)
Master Scale PD
A 6488 84 1.29% 0.009
B 1828 0 0.00% 0.0045
C 623 0 0.00% 0.00895
D 139 15 11% 0.01845
Page 38 of 63
3.2.2 Bank 2
Bank 2’s internal rating system discriminates counterparties mainly with respect to the
counterparty’s financials. The rating system has five grades, four non-default grades and
one default class as show in table 4.
Table 4: Client Quality Classification
In forecasting obligor default, Bank 2’s rating system takes into consideration both obligor
specific characteristics and systemic factors that are likely to affect the credit worthiness of
the obligor.
The factors include:
(i) financial status, which is assessed using turnover and profitability, liquidity and
leverage ratio.
(ii) ability to service and repay facility from Cash flow projections, this is assessed
using cash flow projections provided by the counterparty.
(iii) general factors that are systematic in nature like industry and product type; and
(iv) management quality.
Grade Description Map to S&P Rating Scale
S&P PD Master Scale
A Strong AAA 0.01%
B Good AA-A 1.05%
C Acceptable BBB-B 5.00%
D Marginal B-CCC 22%
E Default D 100%
Page 39 of 63
Portfolio Distribution
The bank’s distribution of ratings is as given in the table below:
Figure 5: Ratings Distribution by Class
3.3 Ratings and Rating Model Validation
As discussed in chapter 2, ratings validation is the backtesting of the final assigned ratings,
and assessment of associated long-run average one-year default probabilities (from the
master scale) against the average realised one-year default rate. Ratings validation focuses
on establishing:
(i) whether the rating system discriminates well between defaulters and no-defaulters
ex-ante.
(ii) whether long-run average default rates are consistent with the PDs assigned to the
rating grades.
(iii) are rating migrations consistent with the stated rating philosophy.
0%
5%
10%
15%
20%
25%
30%
35%
A B C D E
Rating Grade
Page 40 of 63
The first focus of validation stated in (i) above relates to tests of discriminatory power of a
rating system. As discussed in chapter 2 this assesses if the use of the rating system is
adding value in forecasting default ex-ante. While (ii) relates to the test of calibration of the
rating system and (iii) refers to the stability of the rating system. As stated in chapter 2 use
of an unstable rating system for regulatory capital usually results in procyclical capital
levels.
3.3.1 Test for Discrimination
(a) Cumulative Accuracy Profile (CAP)
In section 2.2.3 of chapter 2 we gave a conceptual overview of the CAP approach and its
associated ratio AR. The CAP is essentially based on the notion that if the rating system
has discriminatory power then all defaults must occur from the lower classes. The
construction of the CAP was discussed in section 2.2.3 of chapter 2. We however, recall the
formular of AR below:
Where represents the area between the CAP curve of the rating model being validated
and the CAP curve of the random rating, represents the area between the CAP of the
perfect rating model and the CAP curve of the random rating system.
From Figure 1 in chapter 2 we note that the CAP curves are bounded within the unity area
square (between (0,0) and (1,1)). The CAP curve of the random rating model bisects the
square into half, thus the total area of the lower triangle is 0.5. Therefore we have the area:
12
Now, given that in the ideal case we have all defaults being assigned the lowest scores,
then the point at which the curve of the perfect model first hits 1 on the y-axis should be
equal to the probability of default of the validation sample, say p. Therefore, the area is
less than 0.5 by the area of the triangle given by points (0,0),(0,1) and(p,1). Thus:
Page 41 of 63
12
12
121
Therefore:
12
12 1
2 1
1
A programming function can be written using the trapezium rule in Visual Basic (see
Appendix 1).
3.3.2 Test for Calibration
(a) Binomial Test
We use the Binomial test to investigate whether PDs are in line with observed default rates.
This is done under the hypothesis: the PDs are not underestimated.
The test is done using master scale PDs and these are tested against observed default rates
over a given period.
Given the nature of the rating system the test is conducted under the following
assumptions:
(i) Defaults are not correlated.
(ii) Default occurrences are mutually exclusive events.
(iii) Defaults are independent of time, that is, default probabilities are constant across
time.
If D represents the number of defaults that occurred in a given period and N the total
number of counterparties in the class at the beginning of the year.
Further, PD is the PD forecasted by the PD Master Scale as the likelihood of default over
the period.
For a class with less than 30 counterparties the hypothesis that PD is not underestimated is
Page 42 of 63
rejected if:
PDNDBin ,,11
where is the significance level which can be 1% ,5% or 10%; and Bin(r,n,p) represents
the binomial probability of observing r successes out of n trials with probability of success
p.
If a class has 30 or more counterparties a Normal approximation to the Binomial
distribution is used and the above hypothesis is rejected if:
NPDPD
NPDD
1
5.01
Where Φ is the cumulative standard normal distribution.
(b) One Factor Test
If the assumption of independence in the Binomial test is relaxed, we can use a one factor
test that also incorporates systemic factors as possible causes of default. Here we assume
that the deviation of the observed PD from the long run average may be a result of systemic
factors such as a bad macroeconomic environment that affected all borrowers.
Let y represent the percentage change in the value of assets of any borrower over a one year
period. We assume that this change is driven by both idiosyncratic factors and a single
systemic factor. Let ω and Z be idiosyncratic and systemic parameters respectively, both of
which are standard normal variables. Therefore we have the relation below.
1
Where: ρ is the correlation of the borrower with the systemic factor. If we let µ be the percentage change below which the borrower would default, then we have:
Page 43 of 63
μ 1 μ
μ
1
By construction Z is also normally distributed and since the PD for the obligor is known from the PD Master Scale this gives:
μ Φ PD
Where Ф() represents the cumulative normal distribution. Therefore:
μμ
1
ΦΦ PD
1
P(y < µ) can be interpreted as the probability of default that is observed for the class thus:
⁄
Where D are observed defaults over the year from the rating class and K is the total number of defaults. Therefore:
⁄ ΦΦ PD
1
Thus,
Φ Φ ⁄ 1
As stated above, Z follows a standard normal distribution, therefore the probability of
realising Z is given by Ф(Z). At a significance level α we reject the master scale PD if:
ΦΦ Φ ⁄ 1
(c) Assessing Rating Philosophy
To assess the rating philosophy of the rating system, i.e. whether the ratings are Point-in-
Page 44 of 63
Time (PIT), Through-the-Cycle (TTC) or a hybrid, we calculate the mobility indices from
the migration, matrices. The commonly used mobility indices in industry are MP and MSVD,
defined mathematically as:
IPPwithN
PPPM
PtraceNN
PNN
PM
N
ii
SVD
N
iiP
~)
~'
~(
)(
))((1
1)(
1
1)(
1
1
In the equations above, P denotes the migration matrix and pij is the percentage of
companies in rating class i that are in class j one period later. Accordingly, the entry pii on
the diagonal is the percentage of companies that stays in rating class i. I is the identity
matrix, i.e. a stable migration matrix without changes of rating classes. The eigen values of
P are denoted by λi(P). The higher the index (say above 0.5) the less stable the transition
matrix is and the lower the index (say below 0.25) the more stable the transition matrix.
Page 45 of 63
Chapter4:Results
4.1 Introduction
In this chapter we present the major results of our study based on the ratings produced by
the rating systems described and methodologies discussed in chapter 3.
4.2 Analysis of Bank 1 Ratings Data
4.2.1 Ratings Migration
The following is a migration matrix from the ratings data over a quarter:
Table 5: Migration Matrix
The transition matrix shows that the default class, E, does not represent a credit quality
state that is absorbing. Thus, the default class does not represent total inability of a
counterparty to meet an obligation but represents delinquent behaviours of counterparties.
This is supported by the 0.92 probability of counterparties rated D migrating to the A class
which represents the highest credit quality.
Generally, all classes that are close to the default class E, have a non-zero probability of
migrating into rating class A. This shows that the bank may be placing greater emphasis on
loan classifications when determining borrower rating. This presents a challenge since
rating migrations will become unstable and the default class will not be absorbing. This
may also reflect the bank’s own interpretation of the definition of default. Though the
bank’s definition is equivalent to the Basel II definition of default the bank’s interpretation
and application may not be consistent with the principles outlined in chapter 2.
The mobility index MP, for the matrix above, was found to be 1.21 implying a PIT rating
A B C D E
A 0.309 0.518 0.142 0.027 0.003
B 0.954 0.013 0.033 0 0
C 0.981 0.008 0.002 0.01 0
D 0.921 0.007 0 0.043 0.029
Initial Rating
Final Rating
Page 46 of 63
system.PIT rating philosophy may be appropriate for credit granting decision making, if
used in conjunction with continuous monitoring between ratings updates, but may not be
very useful for purposes of setting either regulatory capital (RC) or economic capital (EC).
A PIT rating system can at times understate capital as the business cycle varies, leading to a
bank holding less capital than the risk inherent in its portfolio. Further, a purely PIT rating
system for credit portfolio management is costly to maintain. When using such a system,
obligors must be re-rated frequently otherwise the obligors’ PDs will not reflect the current
expectation of default likelihood.
4.2.2 Tests for Discrimination
Figure 6: CAP for Ratings Data from Bank 1
The CAP plot shown above highlights that the CAP curve for the internal rating scale is
convex and not concave, section 2.3 highlighted that such curvature of a CAP curve means
that the internal rating scale is sub-optimal in applying available information in rating. This
is also reinforced by the CAP curve’s AR of -0.05. This may be also a result of a situation
where the bank uses facility ratings or loan classification, which measure delinquent
Page 47 of 63
behaviour, in coming with obligor ratings.
4.3 Tests for Calibration
(a) Binomial Test
A Binomial test of the master scale PDs on the portfolio gives the results in the following
table:
Table 6: Results of the Binomial Test
Where PDU means PD is underestimated and PDNU means PD is not underestimated. In
the above table we reject the null hypothesis at the 1% level if the p-value is less than 1%.
The red coloured text represents a p-value that is less than 1% and the black coloured text
represents a p-value that is greater than 10%.
The table shows that at a 1% significance level the Master Scale PDs underestimates the
observed default rates for rating classes A and D for the portfolio.
However, observed PDs of the portfolio for the quarter are not underestimated by the PDs
of the master scale for the B and C classes at the 1% significance level.
It should be noted however, that the lack of granularity of the rating scale resulted in
concentration in the first grades and in turn elevated the likelihood of a deviation of
observed PDs from their long run average. Further the inclusion of SME retail obligors in
the portfolio who are highly correlated to the business cycle may have also contributed to
the large number of defaults in class A.
The Binomial test above, as a result of the independence of defaults assumption, only
considered idiosyncratic factors. That is factors that are obligor specific with regards
default. We therefore consider a one factor test below.
Rating Class Total in ClassExpected No. of Defaults/ year
Default Rates (Annualised)
PD Master Scale p-Value (at 1% level)
Conclusion
A 6488 84 1.3% 0.9% 0.000482 PDU
B 1828 0 0.0% 0.5% 0.998853 PDNU
C 623 0 0.0% 0.9% 0.995126 PDNU
D 139 15 11.0% 1.8% 0.000000 PDU
Page 48 of 63
(b) One Factor Test
If the assumption of independence in the Binomial test is relaxed, we can use a one factor
test that also incorporates systemic factors as possible causes of default. It allows us to test
if it was a result of the economy which was in a downturn.
If the one factor test is done on the classes that showed that the master scale underestimated
actual default rate assuming a correlation of 7% to the business cycle, the results are as
shown in the table below.
Table 7: Results of the One Factor Test
The table above shows the test for class A and D that failed the Binomial test. The table
shows that the underestimation of the default rate by the Master Scale PD of the A class
was due to systemic factors like the business cycle which was on the down turn. However,
the one factor test shows that at the 1% significance level the default rate for the D class
was underestimated.
The underestimation of defaults leads to underestimation of provisions and if the bank is
using its internally derived PDs for Regulatory Capital the bank will underestimate its
Regulatory Capital. This will result in the bank holding credit risk capital which is not
commensurate with its credit risk profile.
4.3.1 Analysis of Bank 2 Ratings Data
This test is based on the assumption that, for a perfect rating scale, all defaults are expected
to be from the least credit class.
The CAP is constructed as shown in the diagram below. An accuracy ratio is then
computed as a ratio of the maximum possible area enclosed below the curve of a perfect
rating system and the diagonal line, and the area under the curve of the rating system CAP
under assessment.
Rating Class Total in ClassExpected No. of Defaults/ year
Default Rates (Annualised)
PD Master Scale p-Value (at 1% level)
Conclusion
A 6488 84 1.3% 0.9% 0.205833 PDNU
D 139 15 11.0% 1.8% 0.000367 PDU
Page 49 of 63
With time, the credit quality of borrowers changes as various macroeconomic variables
change. Thus it is critical to monitor these migrations to ensure credit decisions are
informed by relevant credit metrics. The likelihood of such migrations is measured by
transition matrices as discussed in chapter 2.
The following is a migration matrix from Bank 2’s obligor ratings.
Table 8: Migration Matrix for Bank 2 Credit Portfolio
The transition matrix shows that ratings assigned to counterparties are stable over time as
shown by the high probability of counterparties remaining in originally assigned classes.
Such a rating system is usually characteristic of a TTC rating system. However, the lack of
migration from an original class to other classes needs to be interrogated. This shows that
the bank may be only rating borrowers on loan granting with no further rating revisions
during the life of the loan. Such a scenario produces direct jumps in ratings from originally
assigned ratings into the default class. If the bank’s rating system is based on a PIT rating
philosophy the strategy may result in biased ratings with time.
The mobility index MP, of the above transition, was found to be 0.18 implying a TTC rating
system. TTC rating philosophy is most appropriate for purposes of setting either regulatory
capital (RC) or economic capital (EC) as it provides ratings that are stable through the
business cycle.
A B C D E
A 1 0 0 0 0
B 0 0.92 0 0 0.08
C 0 0 0.9 0 0.1
D 0 0 0 0.65 0.35
Rating at Beginning of
Period
Rating at end of period
Page 50 of 63
(a) Cumulative Accuracy Profile
Figure 7: CAP for Ratings Data from Bank 2
The maximum area is given by the area under the curve of the possible perfect rating
system associated with the rating system under analysis.
The accuracy ratio ranges between -1 and 1. For a rating system that adds value it must
have an accuracy ratio that is greater than zero, otherwise the rating system is randomly
assigning ratings to counterparties and it has extremely weak or no discriminatory power.
An accuracy ratio of 1 represents a rating system that has perfect discriminatory power,
while a ratio of 0 represents a rating system which randomly assigns borrowers to rating
grades.
The CAP of Bank 2’s rating system is shown in the diagram above and was constructed
using ratings generated from the bank’s rating system. The rating system has an accuracy
ratio of 0.54 which shows that the rating system adds value regarding discriminating
counterparties with respect to credit quality.
4.3.2 Test for Calibration
(a) Binomial Test
Page 51 of 63
In testing the calibration power of the rating system we also employ the binomial test to the
portfolio data of Bank 2. The results are as shown in the table below.
Table 9: Binomial Test for Bank 2’s Rating System
In the above table the rejection criterion for the null hypothesis is similar to the one above
for Bank 1. The table shows that at a 1% significance level we fail to reject the null
hypothesis that the Master Scale PDs do not underestimate the observed default rates for
the rating system.
In this section we presented results of the validation of Banks 1 and 2’s rating systems. The
tests highlighted that Bank 1’s rating system has poor discrimination and requires re-
calibration of some of its parameters. However, Bank 2’s rating system has been
performing according to intended objective as highlighted by the tests.
Rating ClassAnnual Default Rates
PD Master Scalep-Value (at 1% level)
Conclusion
A 0.00% 0.01% 0.998702 PDNU
B 8.00% 1.05% 0.887813 PDNU
C 10.00% 5.00% 0.811323 PDNU
D 35.00% 22.00% 0.923510 PDNU
Page 52 of 63
Chapter5:Conclusion
In this chapter we give the main results of our research and offer our recommendations.
The research noted that validation approaches that are currently in place enable users to
objectively determine the sufficiency of rating systems in meeting intended objectives. The
binomial test and the one-factor test aid in informing the significance of rating parameters
of a rating system. Interviews with bank management of bank 1 highlighted that their rating
system had not been validated since inception and it has been in use for more than 5 years.
The results in chapter 4 noted that the bank’s rating system sub-optimally uses available
information on borrowers and hence does not discriminate appropriately borrowers with
respect to credit risk. We have thus been able to show that proper validation techniques are
necessary for ensuring that rating systems operate as intended.
The consequences of using such a rating system for capital is dangerous, as it facilitates
serious understatement of provisions and capital that the bank is supposed to hold. Thus
resulting in inadequate buffers against expected and unexpected credit risk loses.
Bank 2’s rating system was operating as intended even though it had not been validated
since inception. It is, however, critical for the institution to start validating the rating
system to monitor its continued relevance to changes in the operating environment and
portfolio mix of borrowers. The study noted that it is critical for banking institutions to
incorporate and most importantly make operational a sound definition of default. This
enables banking institutions to consistently and objectively identify defaults when they
occur.
As has been discussed that the Zimbabwe market like the rest of the MEFMI region, is
characterised by few corporate entities that are rated by reputable external rating agencies,
lack of expertise on rating systems development and maintenance, absence of marketable
credit securities (e.g. credit default swaps) and short credit data histories. In such an
environment it may be necessary for regulators and supervisors to facilitate the creation of
a central credit risk data registry. Such a registry enables pooling of data for validation and
calibration of various rating parameters. Supervisors need to also develop skills in rating
Page 53 of 63
validations and in turn guide bankers on sound validation standards.
It is therefore recommended that Supervisors in the MEFMI region begin to nature
competences in sound validation practices to ensure that they are relevant to changes in
their financial systems on an ongoing basis. In order to overcome the challenges regarding
sufficiency of data it is necessary that central banks facilitate the establishment of central
credit registers which enable banks to determine the level of borrower leverage and pooling
data. This will also assist in providing proper data length for calibration and validation of
various credit risk rating parameters. It is also critical that central banks in the MEFMI
region provide guidance to banks operating in their jurisdictions on sound principles of
operating and maintaining rating systems.
Page 54 of 63
Appendix 1: Visual Basic Program for CAP
Function CAP(ratings, defaults)
'function written for data that is sorted from worst trating to best
Dim N As Long, numdef As Long, a As Integer, i As Long
Dim xi As Double, yi As Double, xy(), area As Double
N = Application.WorksheetFunction.Count(defaults)
numdef = Application.WorksheetFunction.Sum(defaults)
'Determine number of rating categories K
K = 1
For i = 2 To N
If ratings(i) <> ratings(i - 1) Then K = K + 1
Next i
ReDim xy(1 To K + 2, 1 To 2)
'First row of function reserved for accuracy ratio, 2nd is origin (0,0)
'so start with a=3
a = 3
For i = 1 To N
'cumulative fraction of observations (xi) and defaults (yi)
xi = xi + 1 / N
yi = yi + defaults(i) / numdef
'Determine CAP points and area below CAP
If ratings(i) <> ratings(i + IIf(i = N, 0, 1)) Or i = N Then
xy(a, 1) = xi
xy(a, 2) = yi
area = area + (xy(a, 1) - xy(a - 1, 1)) * (xy(a - 1, 2) + xy(a, 2)) / 2
a = a + 1
End If
Next i
'Accuracy ratio
xy(1, 1) = (area - 0.5) / ((1 - numdef / N / 2) - 0.5)
xy(1, 2) = "(Accrat)"
CAP = xy
End function
Page 55 of 63
Bibliography
1. Basel Committee on Banking Supervision (2006). “International Convergence of
Capital Measurement and Capital Standards: A Revised Framework”, Comprehensive
Version, Bank for International Settlements, Basel, Switzerland.
2. Basel Committee on Banking Supervision (1988). “International Convergence of
Capital Measurement and Capital Standards” Bank for International Settlements,
Basel, Switzerland.
3. Basel Committee on Banking Supervision (2000). “Principles of Managing Credit
Risk”
4. Basel Committee on Banking Supervision (2006).”Core Principle for Effective Banking
Supervision” Bank for International Settlements, Basel, Switzerland.
5. Basel Committee on Banking Supervision (2005).” Basel Committee News Letter No.
4” Bank for International Settlements, Basel, Switzerland.
6. Basel Committee on Banking Supervision (2005).”Studies on the Validation of Internal
Rating Systems”, Working Paper No. 14, Bank for International Settlements, Basel,
Switzerland.
7. Blochwitz. S, Hohl. S (2006) “The Basel II Risk Parameters” pages 243-262, Springer,
New York, USA.
8. Board of Governors of the Federal Reserve System (2009). “The Supervisory Capital
Assessment Program: Overview of Results”.
9. Board of Governors of the Federal Reserve System (2009). “The Supervisory Capital
Assessment Program: Design and Implementation”.
10. Cloquitt. J (2007): “Credit Risk Management”, Third Edition, McGraw Hill, New York,
USA.
11. Dowd. K. (2006): “Measuring Market Risk”, Second Edition, John Wiley & Sons Ltd,
West Success, England.
12. Engelmann. B, Hayden. E and Tasche. D (2002): “Measuring Discriminative Power of
Rating Systems”
13. Engelmann. B, Hayden. E and Tasche. D (2003): “Testing Rating Accuracy”, Risk.Net
14. Engelmann. B and Rauhmeier. R(2006) “The Basel II Risk Parameters”, Springer, New
York, USA.
Page 56 of 63
15. Jafry. Y and Schuermann. T (2003): “Metrics for Comparing Credit Migration Matrix”,
Wharton Financial Institutions Center
16. Loffler. G and Posch. P. N (2007), “Credit Risk Modeling Using Excel and VBA” John
Wiley & Sons Ltd, West Success, England.
17. Ncube. A, Kavuma. S (2009): “Current Perspectives, Challenges and Consensus on
Developing Domestic Financial Markets in MEFMI Countries”, Issue No. 7, MEFMI
Forum.
18. Rauhmeier. R and Scheule. H (2005): “Rating Properties and Their Implication on
Basel II capital”
19. Reserve Bank of Zimbabwe: 2011 “Guideline: :No.1-2011/BSD - Technical Guidance
on the Implementation of Basel II in Zimbabwe”
20. Resti. A and Sironi. A (2007): "Risk Management and Shareholders Value in Banking",
John Wiley & Sons Ltd, West Success, England.
21. Sobehart. J. R, Keenan. S. C and Stein. R. M (2000): “Benchmarking Quantitative
Default Risk Models: A Validation Methodology”, Moody’s Investor Service.
22. Tasche. D (2009): “Estimating Discriminatory Power and PD Curves When the
Number of Defaults is small”.