viscovery som (clustering)

Upload: alexandra-ana-maria-nastu

Post on 06-Jul-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 ViscoVery SOM (clustering)

    1/173

    CREDI T RATI NG PREDI CTI ON

    US I NG SELF- ORGA NI Z I NG M A PS

    Visually exploring and constructing a quantit ative model 

    Roger P.G.H. Tan

  • 8/17/2019 ViscoVery SOM (clustering)

    2/173

  • 8/17/2019 ViscoVery SOM (clustering)

    3/173

    CREDI T RATI NG PREDI CTI ON

    US I NG SELF- ORGA NI Z I NG M A PS

    Visually exploring and constructing a quantit ative model 

    Roger P.G.H. TanStudentnr. 140033

    Erasmus University Rotterdam

    Faculty of Economics

    July 2000

  • 8/17/2019 ViscoVery SOM (clustering)

    4/173

    co n t en t s

    iv

    con t en t s

    cont ent s iv

    pr ef ace vii i

    1 int r oduct ion 1

    1.1 Overview 2

    1.2 Resea rch doma in 41.2.1 Bond ratings 41.2.2 Financial data and ratings 41.2.3 Self-Organizing Maps 5

    1.3 Research topics 6

    2 cr edit r at ings 9

    2.1 Credits a nd credit ra tings 102.1.1 Bonds 102.1.2 Credits 112.1.3 Credit ra tings 122.1.4 Ra tings and default risk 13

    2.2 The S &P credit ra ting process 152.2.1 Process steps 15

    2.3 Financia l sta tement ana lysis 182.3.1 Financia l sta tement 182.3.2 Financia l ra tios 182.3.3 Balance sheet and income sta tement 192.3.4 Used ratios 20

    2.4 Summa ry 23

  • 8/17/2019 ViscoVery SOM (clustering)

    5/173

    v

    3 sel f -or ganizing maps 25

    3.1 Knowledge discovery 263.1.1 Introduction 26

    3.1.2 Knowledge discovery process 263.1.3 Description and prediction 27

    3.2 Projection and clustering techniques 283.2.1 Linear projection 283.2.2 Non-linear projection 293.2.3 Hierarchica l clustering 303.2.4 Non-hierarchical clustering 30

    3.3 Classifica tion techniques 313.3.1 Linear regression 313.3.2 Ordered logit 313.3.3 Artificia l neura l networks 32

    3.4 Self Orga nizing Ma ps 333.4.1 Introduction 333.4.2 Overview 34

    3.5 SOM projection 36

    3.5.1 The self-organization process 363.5.2 A two dimensiona l example 373.5.3 Mathematical description 393.5.4 A three dimensiona l example 40

    3.6 SOM visua liza tion and clustering 413.6.1 Maps 413.6.2 Map quality 433.6.3 Clusters 453.6.4 Cluster quality 47

    3.6.5 Map settings 47

    3.7 SOM interpreta tion and eva lua tion 503.7.1 Description 503.7.2 Prediction 52

    3.8 SOM questions and a nswers 55

    3.9 Summa ry 57

  • 8/17/2019 ViscoVery SOM (clustering)

    6/173

    co n t en t s

    vi

    4 descr ipt ive anal ysis 59

    4.1 Ba sic da ta ana lysis 604.1.1 Data selection 60

    4.1.2 Pre-processing &transformation 64

    4.2 Clustering companies 674.2.1 Crea ting suitable maps 674.2.2 Intermediate results 684.2.3 Results 70

    4.3 Compa ring S&P ra tings 724.3.1 Associa ting ra tings 724.3.2 Measuring the goodness of fit 734.3.3 Results 75

    4.4 Sensitivity ana lysis 784.4.1 Cluster coincidence plots 784.4.2 Results 79

    4.5 Benchma rk 824.5.1 Principa l Component Ana lysis 824.5.2 Results 82

    4.5.3 Comparison with SOM 84

    4.6 Summary 85

    5 cl assif icat ion model 87

    5.1 Model set-up 885.1.1 Training and prediction 885.1.2 Data 885.1.3 The prediction process 89

    5.1.4 Ratings distribution 895.1.5 Measuring performance 91

    5.2 Model construction 955.2.1 Initia l model 955.2.2 Variable reduction 965.2.3 Sensitivity analysis 995.2.4 Results 102

  • 8/17/2019 ViscoVery SOM (clustering)

    7/173

    vii

    5.3 Model va lida tion 1045.3.1 Comparison with constant prediction 1045.3.2 Comparison with random prediction 1045.3.3 Classifications per ra ting class 106

    5.3.4 Equa lized ra tings distribution 106

    5.4 Benchmark 1085.4.1 Linear regression 1085.4.2 Ordered logit 1085.4.3 Results &comparison with SOM 109

    5.5 Out-of-sa mple test 1125.5.1 Results for test set 1125.5.2 Results for older historical periods 113

    5.5.3 Linking spreads 114

    5.6 Summa ry 117

    6 concl usions 119

    6.1 Conclusions 120

    6.2 Further research 122

    7 bibl iogr aphy 123

    appendix 125

    I Artificia l neura l networks 126

    II Itera tions of the SOM algorithm 128

    III SOM example: Recta l muscle sizes 131

    IV SOM example: Customer segmenta tion 134

    V Sta tistica l measures and tests 138

    VI Descriptive a na lysis 140

    VII Cla ssifica tion model 163

  • 8/17/2019 ViscoVery SOM (clustering)

    8/173

    pr ef ace

    viii

    pr ef ace

    This mas ters thesis fo rms the conclusion to my study o f Econometrics, with specialization Bus iness Oriented

    Computer Science, at the Eras mus University of Rott erda m. It was written during my internship at the

    Quant itat ive Research (QR) depa rtment of the Rotterda m ba sed a ss et-manag er Robeco Group. My time a t the

    Robeco Group has been very enjoyable, a nd the combination of practical resea rch a nd writing a t the sa me time

    has proven to be a very relaxed a nd sure way of writing a t hesis. I can recommend this to everyone in the final

    sta ge of his or her study.

    This thesis is ta rgeted a t reade rs from two different scientific area s (computer science and financial

    econometrics), so some concepts are treat ed more extensively tha n first may s eem necessary. Considerable

    time wa s a lso spent making this thes is into a n att ractive package, but a t a ll times have I striven to keep looks

    and content in good ba lance.

    Naturally I could not have written this without the comments and encouragement I received from many people,

    some of which I would like to especially mention: First a nd foremost I would like to tha nk dr.ir. Peter Ferket, mymentor at Robeco a nd hea d of QR, and dr.ir. Jan va n den Berg and drs. Willem-Max van d en Bergh, b oth

    as socia te professors at the faculty of Economics at the Eras mus University. They all provided invalua ble

    comments on this thesis in its s everal sta ges of development. Furthermore my gratitude g oes out t o the

    members of the Credits resea rch tea m, to my roommat es a nd to the other collea gues a t QR, for ans wering t he

    many ques tions a Computer Science gradua te inevita bly has when acting like an econometrician. Finally I wa nt

    to s ay tha nks to dr. Guido Deboeck (Virtua l Imagineer, U.S.A.) and dr. Gerhard Kranner (Euda ptics, Austria) for

    taking the time to answer my many emails, providing new insights and a better understanding of Self-

    Organizing Maps. Euda ptics also g enerously supplied me with the la tes t version of their Viscovery SOMine

    softwa re, so tha t I could focus on the real resea rch subject instea d of ha ving to d evote time to programming.

    As much as I ha ve loved the past few yea rs I spent partly studying, partly working and pa rtly partying, I’m glad

    this sta ge of my life has come to a conclusion. I’m looking forwa rd to put even more energy into my new job a s I

    have put into this thesis.

    Roger Tan, July 2000

  • 8/17/2019 ViscoVery SOM (clustering)

    9/173

    1 i n t r oduct i on

    In chapter 1 we introduce the main probl em and the research top ics for this thesis. Paragraph 1 gives a brief 

    overview of the prob lem setting and paragraph 2 describes the domain of research. Paragraph 3 reviews the 

    central question and several sub-questions to be answered in the remainder of thi s thesis.

  • 8/17/2019 ViscoVery SOM (clustering)

    10/173

  • 8/17/2019 ViscoVery SOM (clustering)

    11/173

    Overview 

    3

    Severa l techniques ha ve been developed for thes e kind of ana lyses . We will focus on a less common technique

    called Se lf-Orga nizing Maps, which is a combinat ion of a projection and a clustering algorithm. Its main

    ad vantag es a re the insightful visualizations of large da tas ets a nd its flexibility.

  • 8/17/2019 ViscoVery SOM (clustering)

    12/173

    1 i n t r o du ct i o n

    4

    1.2 Research domain

    1.2.1  Bond ratingsBond ratings a re letter values on an ordinal scale, g iving a n opinion of creditworthiness of the iss uer of a bond.

    The two most important rating a gencies (issuers of ratings) are S tanda rd &Poor’s a nd

    Moody’s. The ratings issued by these two a gencies are comparable, but in this thesis

    we w ill focus on St anda rd &Poor’s.

    Examples of ra tings are AA or B, the full ra ting s cale is shown in ta ble 1-1. A low ra ting

    (e.g. CC) corresponds to a high defa ult risk, a high rat ing (e.g. AA) corresponds to a low 

    defa ult risk. A ‘D’ indicate s an actua l default on the bond. The sca le is even more

    refined by appending ‘+ ’ or ‘-‘ to the letter rat ing, indicating a slightly better or slightly

    worse rating.

    Nowa da ys, more and more companies ha ve been rated, but st ill most rated companies

    are ba sed in the United Sta tes of America. Also more historical da ta is ava ilable for

    thes e companies. Therefore, our resea rch will be conducted using only U.S. ba sed

    companies.

    1.2.2  Financial da ta and ratingsRating a gencies claim that the issued ratings a re based on (1) a qua ntitative ana lysis of

    the financial statement of a company and (2) a qualitative analysis of the company a nd

    the environment of the company: What is the long term strateg y, are there any

    impending threa ts on future profitability not expressible in the financial sta tement (like

    lawsuits), and w hat is the economic outlook for the sector as a whole? We will treat

    the credit rating process extensively in chapter 2, but suffice it t o s ay that the contribution of q ualitative factorsto the rating is unclear. We can clarify the relat ionship between financial data and credit ratings using

    quantitative techniques like the Self-Organizing Map and indirectly give an assessment of the contribution of

    qua litat ive factors.

    Financial sta tement data on most US companies is available in huge da ta bas es from da tas ources like Compustat

    and WorldScope. The information in these da ta bas es could help us ga in a bette r understanding of the

    relationship betwe en financia l informat ion and bond rat ings. It might even provide us with a means to correctly

    Tabl e 1-1 Cr edit r ati ngscal e

    St and ar d & Po o r ’s

    AAA

    AA+

    AA

    AA-A+

    A

    A-

    BBB+

    BB B

    BBB-

    BB +

    BB

    BB -

    B+

    B

    B-

    CCC+

    CCC

    CCC-

    CC

    C

    D

  • 8/17/2019 ViscoVery SOM (clustering)

    13/173

    R e s e a r c h d o m a i n

    5

    predict bond ratings, ba sed on the stored financial da ta a lone. However, trans forming the stored data into

    knowledge  is no trivial ta sk.

    1.2.3  Self-Orga nizing MapsA common problem is the complex nature of large a mounts of dat a . Our universe conta ins a large number of

    companies, a nd for ea ch company many financial characteristics are available. This hinders the inference of

    sensible relationships; to cope with the problem specific techniques have be en developed2. In this thesis we

    will focus on the Self-Orga nizing Map technique.

    Self-Organizing Maps (SOMs) use a n a dvanced a lgorithm to form an a s g ood a s poss ible representa tion of the

    da ta. Clusters of similar companies a re identified a nd displayed on a map, using colours to enha nce the

    represe nta tion. The voluminous original data set is compressed into a 2-dimensional, ea sily rea da ble map. Thecontributions of individual characteristics are also part of the display, making it possible to visually infer

    relationships from the underlying da ta.

    The Self-Organizing Map can be used as a visua l exploration tool and as a class ificat ion model. Both functions

    will be illustrated us ing our bond rating problem.

     

    2 Fayya d, U.M., 1996, Chapter 1.

  • 8/17/2019 ViscoVery SOM (clustering)

    14/173

    1 i n t r o du ct i o n

    6

    1.3  Resea rch topics

    The ea rlier sketched domain forms the ba ckground for the following centra l ques tion in this thesis :

    In what way can we use Self-Organizing Maps to explore the relationship between financial statement data 

    and credit ratings? 

    This question ca n be broken down into the following five sub-quest ions:

    1.  What are credit rat ings and how is the credit rating process structured? 

    An analysis of the Sta ndard &Poor credit rating process gives us a better understanding of the relation between

    credit ratings and financial sta tement data .

    2.  What are Self-Organizing Maps and how can they aid in explori ng relati onships in large data sets? 

    Before we can trust the results inferred from the SOM maps we first have to understand how the SOM gives a

    view on the underlying dat a . We provide an in-depth review of the algorithm itself and a guide on how to

    interpret the g enerated results.

    3.  Is it possible to find a logical clustering of companies, based on the financial statements of these 

    companies? 

    First w e would like to know if companies are discernible bas ed on financial sta tement da ta alone.

    4.  If such a clustering is found, does this clustering coincide with levels of creditworth iness of the companies 

    in  a cluster? 

    We then compare the found clustering with the dist ribution of the rat ings over the companies to de termine to

    wha t extent they coincide.

    5.  Is it possible to classify companies in rating classes using onl y financial statement data? 

    Using previously found knowledge we set up a model specifically suited to the task of classifying new 

    companies using financial sta tement dat a.

  • 8/17/2019 ViscoVery SOM (clustering)

    15/173

    R e s e a r c h t o p i cs

    7

    This thesis is divided into several chapters. Chapter 1 conta ins the introduction, a description of the resea rch

    domain and this overview of the resea rch topics. In chapter 2 we give a theoretical treatment of t he credit rat ing

    process a nd in chapter 3 we provide a n in-dept h review of Self-Orga nizing Maps . Chapter 4 discuss es the

    descriptive analysis after which chapter 5 focuses on the classification model. In chapter 6 we draw ourconclusions and present some s uggest ions for further research.

  • 8/17/2019 ViscoVery SOM (clustering)

    16/173

  • 8/17/2019 ViscoVery SOM (clustering)

    17/173

    2 cr edi t r at i ngs

    This chapter provides a background on credits and credit rat ings. Question 1 from the introduction is answered: 

    1.  What are credit rat ings and how i s the credit rating process structured? 

    Paragraph 1 addresses the theoretical foundati ons of credits and credit rati ngs. Paragraph 2 reviews the rati ng 

    process of Standard & Poor’ s, a well-known rati ng agency. Paragraph 3 evaluates the key financial ratios 

    applicable to the economic sector under scrutiny in th is thesis, Consumer Cyclicals.

  • 8/17/2019 ViscoVery SOM (clustering)

    18/173

    2 cr ed i t r at i n g s

    10

    2.1 Credits and credit ra tings

    2.1.1  BondsIn its most s imple form a bo nd is a loan from one entity to the othe r. The entity tha t receives the loan (this is

    often a g overnment or a large company) is called the obligor or issuer, the loa n itself is called a bond obligation

    or issue. The bond is freely tradable on the exchanges and split up into smaller parts, t o make the bond more

    marketable.

    Bonds belong to the group of fixed-income instruments, because they periodically pay a fixed amount (the

    coupon) to the buyer of the bond. Bonds d iffer from eq uity (or stockholders sha res) in that buyers of bonds do

    not become owners of the company. When a company g oes into ba nkruptcy, the owner of the bond is in a be tter

    position tha n the s hareholder because first all the loa ns a re redeemed, and from which is left (if a ny) the owners

    are repaid.

    Characteristics

    Each bond has certain characteristics, which fully describe the bond. The bond has to be redeemed on a fixed

    dat e, called the maturity da te. Bonds with original maturities longer than a year a re considered long-term, all

    bonds w ith maturities up to one year a re considered short-term. Each period a certain interest percentage has

    to be paid in the form of the coupon. Often this percentag e is fixed, but sometimes this percentag e is

    dependent on the ma rket interest rate (the coupon is floating). Other variations on t he sta ndard bond include

    sinking redemptions (periodically a part of the bond is redeemed), callable bonds (at certain dates the issuer

    has the right to prematurely redeem the bond), and of course special combinations leading to more exotic

    variants.

    Value

    The value of a bond depends la rgely on the coupon percentag e and the current market interest ra te. If themarket interest ra te rises, then the value of the bond lowers. The coupon percenta ge is fixed, a nd investors

    would rathe r buy a new bond w ith a coupon that is more in-line with the current market interest rate. If the

    market interest rate de clines, then the value of the bond rises . Investors would rather buy our bond tha n new 

    bonds w ith lower interest rates .

    The value of the bond is det ermined in the market, by the forces of supply and de mand. Using the market price

    the current yield  of the bond ca n be ca lculated. This is the internal discount factor needed when discounting a ll

    future cash flows o f the bond (coupon payments a nd redemption payment) to represent t he current price. This

  • 8/17/2019 ViscoVery SOM (clustering)

    19/173

  • 8/17/2019 ViscoVery SOM (clustering)

    20/173

    2 cr ed i t r at i n g s

    12

    2.1.3  Credit rat ingsAccording to Sta nda rd &Poor’s (S&P), “t he bond or credit ra ting is an opinion of the genera l creditw orthiness o f

    an obligor with respect to a particular debt security or other financial obligation, based on relevant risk

    factors.”4

      All rating a gencies seem to support this definition.

    Rat ing ag encies

    A rating ag ency, of which S&P is one of the best known examples, as sess es the relevant factors relating to the

    creditworthiness o f the iss uer. These include the qua ntita tive fact ors like the profitability of the company and

    the amount of outstanding debt, but also the qualitative factors like skill of manag ement and economic

    expectat ions for the company. The whole ana lysis is then condense d into a lette r rating5. Standa rd &Poor’s

    and Moody’s both have been rating bonds for almost a century and are the leading rating agencies right now.

    Other reputab le rating institutions a re Fitch and Duff &Phelps.

    Rat ings interpreta tion

    The types of a ssigned rat ings a re comparable for most ag encies, and for S&P a nd Moody’s there is a direct

     

    4 Sta ndard &Poor’s, 2000, page 7.

    Tabl e 2-1 Cr edit r atings and inter pretat ion

    S & P M o o d y ’ s I n t e r p r e t a t i o n

    AAA Aaa Hig hes t q u al i t y

    AA+ Aa1

    AA Aa2AA- Aa3

    High q ual i t y

    A+ A1

    A A2

    A- A3

    St r ong payment capaci t y

    BBB+ Baa1

    BBB Baa2

    BBB- Baa3

    Adeq uat e payment capaci t y

    BB+ Ba1

    BB Ba2

    BB- Ba3

    Likel y t o f ul f i l o bl igat ion s; o ngo in g un cer t ain t y

    B+ B1

    B B2

    B- B3

    High r isk o b l igat ions

    CCC+

    CCC

    CCC-

    Caa

    CC

    Cur r ent vu l ner abi l i t y t o def aul t , o r in d ef aul t (Mo o dy’ s)

    C

    Ca

    Bank r upt cy f i l ed

    D D Def au l t ed

  • 8/17/2019 ViscoVery SOM (clustering)

    21/173

  • 8/17/2019 ViscoVery SOM (clustering)

    22/173

    2 cr ed i t r at i n g s

    14

    Figure 2-1 shows the default rates corresponding to Moody’s rating classes for 19997. As is to be expected, the

    lower rating clas ses have corresponding higher default rates.

    Investment grade versus speculative grade

    Credits with a n as signed ra ting from AAA to BBB- are known a s investment gra de credits. Lower rat ed iss ues a re

    known as specula tive grade credits, high yield issues or junk bonds . The spread s on these high yield issues a re

    relatively wide, thus providing an interest ing investment opportunity. This is even more so a fter finding anaverage recovery rate of 42%

    8 (for every U$ 100 worth of defa ults on a verage U$ 42 recovers).

    Sometimes fundmanagers are restricted to purchasing investment grade issues, to avoid speculative

    investments. However, the abs olute defa ult rat es do not remain sta ble over the yea rs. For example, rest ricting

    the fund manag ers to purchase a t least BBB- grad e issues does not g uarantee lower than 1%default rates.

     

    7

     Moody’s, 2000, pa ge 26.8 Moody’s , 2000, page 17.

    Default rates for 1999

    0

    2

    4

    6

    8

    10

    12

      A  a  a

      A  a  1

      A  a  2

      A  a  3   A  1   A  2   A  3

       B  a  a  1

       B  a  a  2

       B  a  a  3    B  a

      1   B  a

      2   B  a

      3    B  1    B  2    B  3

       %

    Figur e 2-1 Def ault r ates for 1999

  • 8/17/2019 ViscoVery SOM (clustering)

    23/173

    Th e S & P c re d i t r a t i n g p roce s s

    15

    2.2 The S & P credit ra ting process

    “The rating experi en ce is as much an art as it is a science.” – Solomon B. Samson, Chief Rating Officer at 

    Standard & Poor’ s 9 .

    This paragraph des cribes the credit rat ing process of S ta ndard & Poor’s. Most informat ion contained in this

    para graph was t aken from the “ Corporate Ratings Criteria” document, on-line published a t the S&P webs ite. In

    this document, the distinction between the qua litat ive and t he qua ntitative analysis is less clear. The

    qua lita tive ana lysis is most e xtensively treated and thus most emphasized. The descriptive a nalysis in chapter

    4 will try to uncover whether this depiction reflects the a ctua l ra ting practice of S&P.

    2.2.1  Process stepsThe Sta nda rd &Poor’s credit rating process can be broken down into severa l steps . The process is summarized

    in figure 2-2.

    Request rating

    Companies themselves often approach Sta ndard & Poor’s to request a ra ting. In addition to this, it is S&P’s

    policy to rate a ny public corporate de bt issue larger tha n U$ 50 million, with or without request from the iss uer.

    Bas ic resea rch

    When the rating is requested a team of a nalysts is ga thered. The ana lysts w orking at S &P ea ch have their own

    sector specialty, covering all risk categories in the sector.

     

    9 St anda rd &Poor’s, 1999.

    request

    rating

    ass ign analytical team

    conduct ba sic research

    meet

    issuer

    ratingcommittee

    meeting

    issue(r)

    rating

    surveil-

    lance

    appeals

    process

    Figur e 2-2 The St andar d & Poor 's cr edit r ati ng process

  • 8/17/2019 ViscoVery SOM (clustering)

    24/173

    2 cr ed i t r at i n g s

    16

    The appropriate a nalysts a re chosen and a lead a nalyst is a ssigned, who is responsible for the conduct of the

    rating process.

    Some ba sic research is conducted, ba sed on publicly available information and based on information received

    from the company prior to the meeting with the management 10. The information requested prior to the meeting

    should conta in:

    -  five years of audited a nnual financial state ments (bala nce sheet a nd profits and loss es a ccount),

    -  the last several interim financial statements (this is mostly applicable to US companies, as they are

    required by law to provide q uarterly financial s tat ements),

    -  narrative d escriptions of operations a nd products,

    -  relevant industry information.

    As some of this ma y be se nsitive information, S &P ha s a st rict policy of confident iality on a ll the information

    obta ined in a non-public fas hion. Any published ra tiona le on the rea lization of the assigned ra ting only contains

    publicly ava ilable information.

    Meeting the issuer

    In the next step a part of the team meets with management of the company to review key factors that have an

    impact on the ra ting. This meeting covers the operat ing and financia l plans of the company and the

    manag ement policies but it is also a qua litat ive a sse ssment of manag ement itself. The meeting is scheduled

    well in advance so ample time for preparation is given.

    The specific topics discussed a t the meet ing are:

    -  the industry environment and prospects,

    -  an overview of the major business segments, including operating statistics and comparisons with

    competitors and industry norms,

    -  manag ement’s financial policies a nd financial performance goa ls,

    -  distinctive accounting practices,

    -  management’s projections, including income and cash flow statements and balance sheets, together with

    the underlying market a nd operating a ssumptions,

     

    10

     So called ‘public information ratings’ a re the exception to this rule; they a re solely based on the annual publicly available financialstatement.

  • 8/17/2019 ViscoVery SOM (clustering)

    25/173

    Th e S & P c re d i t r a t i n g p roce s s

    17

    -  capital spending plans,

    -  financing a lternatives a nd contingency plans.

    Standard &Poor’s does not base its rating on the issuers financial projections, but uses them to indicate how 

    the mana gement a sses ses potential problems a nd future economic developments.

    Rating committee and appeals process

    Shortly afte r the meeting with the mana gement of the issuer the rat ing committee convenes. The rating

    committee consists of five to seven voting members, who will decide on the rating using information presented

    by the lead a nalyst. His presenta tion covers:

    -  an a nalysis of the na ture of the company’s business and its operating environment,

    -  an evaluation of the company’s strateg ic and financial mana gement,

    -  a financial analysis,

    -  and finally a rating recommendation.

    After a d iscussion about the rating recommendation and the fa cts s upporting it the committee votes on the

    recommenda tion. The issuer is notified of the rating a nd the major considera tions supporting it. An appea l is

    possible (the issuer could possibly provide new information), but there is no guarantee that the committee will

    alter its decision.

    Publishing the rat ing

    For public issues the new rat ing is published using several media, e. g. the Internet site o r the “ CreditWeek”

    publicat ion by Standa rd & Poor’s. For ratings as signed on reques t by the issuer, the company itself may

    determine if they wa nt the rating to be publicly available or not. This will often be the ca se, becaus e rating

    requests a re expensive and a public rating facilita tes the neg otiations for loans and lea ses.

    Surveillance

    The ra ted issues a nd issuers are being monitored on an ongoing bas is. New financial or economic

    developments are reviewed a nd often a meeting with the mana gement is scheduled annually. If these

    developments might lead to a rating change, this w ill be made known using the CreditWatch listings. A more

    thorough analysis is performed, after which the rating committee again convenes and decides on the rating

    change.

  • 8/17/2019 ViscoVery SOM (clustering)

    26/173

    2 cr ed i t r at i n g s

    18

    2.3 Financia l st a tement a nalys is

    2.3.1  Financia l s ta tementThe financial sta tement of a company comprises the bala nce sheet a nd the profits a nd losses account. There

    are st rict accounting regulat ions the financial sta tement must a dhere to, wh ich vary for different countries. The

    financial stat ements for companies in different sectors a lso diverge: We would expect a factory to have a raw 

    materia ls inventory on its ba lance shee t, but not a bank. The most important differences occur betw een

    Financial companies and Industrial companies, the next section describes the financial ratios that are most

    applicable to Industrial companies.

    2.3.2  Financia l rat iosThe financial performance of a company can be analyzed b y carefully examining the ba lance sheet a nd income

    sta tement for that company. To make these large q uantities of data more comprehensible a nd to make

    comparisons betw een firms poss ible one often uses financial ratios.

    There are several financial ratio clas ses :

    !  leverage ra tios measure the de bt level of a company,

    !  liquidity ratios measure the ea se w ith which a company ca n acq uire cas h,

    !  profitability ratios mea sure the profits of a company in proportion to its a sse ts.

    In ad dition to these financial ratios a few other clas ses of variables ca n be observed to characterize a company:

    !  size variables measure the size of a company,

    !  sta bility variables measure the sta bility of the company o ver time in terms of size a nd income,

    !  market value ratios meas ure the value investors as sign to a company.

    Although financial ratios provide a mea ns to quickly compare companies , some caut ion should be taken when

    using them. Companies often use different a ccounting s tanda rds, so t wo comparable companies ca n have very

    different values for certain ratios just because of different ways of valuing the items on the balance sheet.

    Furthermore, companies often wa nt to present an a s fa vourable a s poss ible image, known as ‘window dressing’.

    This also lea ds to ra tios not fully representing the rea l financial sta te of the company.

  • 8/17/2019 ViscoVery SOM (clustering)

    27/173

    Fi na n ci a l s t a t e m e nt a n a l y s i s

    19

    2.3.3  Ba lance sheet a nd income s ta tementThe financial rat ios a re calculate d using elements from the ba lance sheet and from the income s tat ement of a

    company. They a re shown in table 2-2 and t ab le 2-3.

    Tabl e 2-2 Balance sheet

    As s et s Li ab i l i t i es

    + cas h & eq u iv al en t s + t o t a l s ho r t t er m d eb t

    + t o t a l n et r ecei v ab l es + acco u nt s payabl e

    + t o t a l i n ven t o r y + o t h er cu r r en t l iab i l i t i es

    + o t h er cu r r en t as set s + i n co me t axes payab l e

    t o t al cur r ent asset s t ot al cur r ent l iabi l i t i es  

    + n e t p r o p er t y , p l a n t & e q u i p me n t + t o t a l l o n g t e r m d e b t

    + i nv es t men t & ad v an ces + o t h er n o n -c ur r en t l i ab il i t i es+ in t an gib l es + d ef er r ed in co me t ax es & i n ves t men t t ax cr ed it

    + o t h er as s et s + min o r i t y in t er es t

    t o t al l i ab il i t i es 

    + pr ef er r ed st o ck

    + t o t al co mmo n eq ui t y

    t o t al assets 

    t o t a l l iab i l i t i es & capit a l 

    Table 2-3 Income st atement

    Inc o me st at ement

    + net sal es

    - co st o f go ods so l d

    - o t her expens es

    earnings befor e inter est , taxes, depr eciat ion and amor t izat ion 

    - depr ecia t io n an d amo r t iza t ion expense

    earnings befor e int er est and t ax 

    - gr o ss in t er est ex pens e

    + specia l i t ems ( no n-r ecur r ing)pr e-t ax income 

    - t o t al in co me t axes

    - min o r i t y in t er est

    net income 

    - pr ef er r ed di vid end s

    ear nings appl icabl e t o common st ock 

  • 8/17/2019 ViscoVery SOM (clustering)

    28/173

    2 cr ed i t r at i n g s

    20

    2.3.4  Used ra tiosOur preliminary se lection yielded the following financia l ratios.

    Interest coverage ratiosThese mea sure the extent to which interest or debt is covered by the ea rnings of a company.

    EBIT int er est cover age:

    (earnings before interest and taxes) / (interest expenses)

    EBITDA int er est cover age

    (earnings before interest, taxes, depreciation and amortization) / (interest expenses)

    EBIT / t ot al debt

    (earnings be fore interest a nd ta xes) / (total debt)

    Leverag e ratios

    Financial leverag e is crea ted when firms borrow money. To measure this leverage, a number of ratios a re

    available.

    Debt r at io

    (long term debt ) / (long term debt + eq uity + minority interest )

    Debt-equit y r at io

    This can be mea sured in several wa ys, tw o of which are:

    (long term debt ) / (equity)

    and

    (long term debt) / (tota l capita l)

    Net gear ing

    (tota l liabilities – cash) / (equity)

    Profita bility ratios

    Profitability rat ios measure the profits of a company in proportion to its a sse ts.

  • 8/17/2019 ViscoVery SOM (clustering)

    29/173

    Fi na n ci a l s t a t e m e nt a n a l y s i s

    21

    Ret ur n on equit y

    This measures the income the firm was able to generate fo r its s hareholders11.

    (net income) / (average eq uity)

    Ret ur n on t ot al asset s

    (earnings before interest and taxes) / (total as sets )

    Oper at ing income / sal es

    (operating income before depreciation) / (sales )

    Net prof it mar gin

    (net income) / (tota l sa les)

    Size variables

    These meas ure the size of a company.

    Tot al asset s

    The tota l as sets of the company.

    Mar ket val ue

    Price per share * number of shares outstanding.

    Sta bility variables

    Stability variables measure the stability of the company over time in terms of size and income.

    Coef f icient of var iati on of net income

    (sta nda rd deviation of net income over 5 years) / (mea n of net income over 5 years)

    Coef f icient of var iati on of t ot al asset s

    (standa rd deviation of tota l ass ets over 5 years) / (mean of tota l ass ets over 5 years)

    Market variables

    Market variables a re used to a sses s the value investors a ssign to a company.

     

    11

     Note the use of the average of the e quity (at the beg inning and the end of the qua rter). Averages a re often used when comparingflow data (net income) with snapshot da ta.

  • 8/17/2019 ViscoVery SOM (clustering)

    30/173

    2 cr ed i t r at i n g s

    22

    Coef f icient of var iati on of ear nings for ecast s (fi scal year 1)

    This measures the risk encapsula ted in the earnings forecas ts (for fiscal yea r 1) of the several ana lysts . If the

    ana lysts do not a gree with ea ch other, that should be a n indication for higher risk involved w ith this company.

    (standard deviation of forecasts fiscal year 1 over analysts) / (mean of forecasts fiscal year 1 overanalysts)

    Mar ket bet a r el at ive t o NYSE

    The bet a is the sensitivity of the stock to market movements, in this case movements of the New York

    Stock Exchange12. A snapshot is ta ken on the last trading da y of the quarter.

    Ear nings per shar e

    This is calculated for the last month of the quarter.

    (earnings a pplicable to common stock) / (tota l number of sha res)

     

    12 Brea ley, R.A. a nd Myers, S .C., 1991, cha pter 7.

  • 8/17/2019 ViscoVery SOM (clustering)

    31/173

    S u m m a r y

    23

    2.4 Summary

    In this chapter we have reviewed some theoretical aspects of bonds and credits before exploring the ratings

    doma in. The credits we a re most interes ted in are bonds issued by companies (corpora te bonds ). We have see n

    the direct relation between creditworthiness, defa ult probability and spread of a credit. If the perceived

    creditworthiness is bett er, then the assigned rat ing will be higher and the defa ult proba bility will be lower. The

    difference in yield with a similar government bond (also known a s the s pread ) will be subseq uently lower.

    The different process steps of the St anda rd &Poo r’s credit rating process emphasize the q ualitative a nalysis

    performed by the a gency. The qua ntitative ana lysis, based on financial state ment data , is just a single step in

    the process. In the remainder of this thes is we will try to uncover whether actua l ra ting practice reflects this

    depiction of matte rs, using the described financial ratios. These ratios form a means to summarize the ba lance

    sheet a nd income sta tement of a company a nd to compare the financial st a tements of different companies.

  • 8/17/2019 ViscoVery SOM (clustering)

    32/173

  • 8/17/2019 ViscoVery SOM (clustering)

    33/173

    3 sel f -o r gan i zi ng maps

    Chapter 3 reviews the Self-Organizing Map and its place in the knowledge discovery process. To provide a 

    background fo r the SOM we wi ll bri efly discuss some related techniques before examining the Self-Organizing 

    Map algorithm. Altogether this answers question 2 from the int roduction: 

    2.  What are Self-Organizing Maps and how can they aid in explor ing relat ionships in large data sets? 

    Paragraph 1 describes the knowl edge discovery process. Paragraph 2 describes some projection and clustering 

    methods related to SOM. Paragraph 3 describes the classification techniques that we also use in the 

    classification model of chapter 5. The remainder of the chapter is dedicated to an explanati on of SOM and 

    guideli nes for the use of SOM.

  • 8/17/2019 ViscoVery SOM (clustering)

    34/173

    3 s el f - o r g an i z i n g map s

    26

    3.1  Knowledg e discovery

    3.1.1  IntroductionThese d ays it is quite common for corporations of a ll kinds and sizes to ga ther

    large amounts of data . This may vary from customer da ta (e.g. sca nned

    purchase data for supermarkets) to data regarding some of the processes

    within a company (e.g. process s ta tes of a machine). On a meso-economic and

    macro-economic level a lot of data is available too, concerning the financial

    sta tements of individual companies or the financial sta tements of countries.

    The volumes of thes e da taba ses are often g igantic, making it impossible toretrieve sensible information just by looking at the raw d at a. To ga in access to

    the knowledge contained in the stored data one has to rely on specific

    techniques, which extract information from the database in a sys tematic way.

    In the ICTsector these techniques are referred to as data-mining13

     techniques,

    and all the steps necessary to extract knowledge from databases is known as

    the knowledge discovery process.

    3.1.2  Knowledge discovery processThe knowledge d iscovery process encompas ses all the ste ps necessary to

    extract potentially useful information (knowledge) from the da taba se14.

    The basic s teps (displayed in figure 3-1) involve:

    -  Creating a target data set based on the available data, the knowledge of

    the underlying doma in and the g oals o f the resea rch.

    -  Pre-processing t his dat a to a ccount for extreme values a nd missing values.

    -  Applying a ny necessa ry trans forma tions.

    -  ‘Mining’ the data so distinct pat terns become available for interpreta tion and evaluation. In this thes is we

    will focus on visualization techniques, whereby specific patterns can be found in the resulting maps.

     

    13 Computer scientists use t he term da ta-mining in a positive context (extracting previously unknown knowledge from large da tab ase s),

    econometricians use the t erm data -mining in a nega tive context (manipulating dat a a nd the used technique to support specific

    conclusions). This sometimes lead s to confusion about the intended meaning.14 Fayya d, U.M., et a l, 1996, chapter 2.

    Data

    Targe t da ta

    selection

    Preprocessed

    data

    preprocessing

    Trans formed

    data

    transformation

    visualization

    Patterns

    Maps

    interpretation

    evaluation

    Knowledge

    Figur e 3-1 The knowl edgediscover y pr ocess

  • 8/17/2019 ViscoVery SOM (clustering)

    35/173

    Kn o w l e d g e d i s c o v e ry

    27

    -  Interpreting a nd evaluating these ma ps, often repeating one or more st eps of the process.

    3.1.3  Description and predictionThe knowledg e discovery process serves two main purposes: des cription and prediction. Descriptive 

    knowl edge discovery  tries to correctly represent the d at a in a compact form. The new represent at ion implicitly

    or explicitly shows relat ionships in the data . Not s o obvious relationships emerge, thus a ttributing to a greater

    knowledge of the underlying domain. Obvious relat ionships are of course visible too, streng thening the image

    one has of the da ta bas ed on preliminary research. Common used te chniques are projection and clustering

    algorithms.

    Predictive knowledge discovery   is used to complement values for one or more characteristics (or variables) of

    observations in the dat a s et. This is often in the form of a classification problem: A da ta set w ith known class

    memberships is used to build a model, and this model is used to predict the class membership for new 

    observations. Common used techniques are linear regression ba sed classifiers like ordered logit and artificial

    neural networks.

    Of course this division is not strict. Some of the algorithms are combinations of techniques , and ofte n the

    des criptive techniques a re used a s an intermediate st ep in large investiga tions. The output of the descriptive

    ana lysis then may s erve a s input for some of the prediction algorithms.

    In the following sections we will highlight some of the available projection, clustering, and classificationtechniques . The Self-Organizing Map, treated extens ively in the remainder of the chapte r, is a ctua lly a neural

    network combining regression, projection and clustering!

  • 8/17/2019 ViscoVery SOM (clustering)

    36/173

    3 s el f - o r g an i z i n g map s

    28

    3.2 Projection and clustering techniques

    We use projection techniques to reduce the dimensionality of the da ta, making it ea sier to g rasp the ess ence of

    the da ta . Projection techniques ca n be split into two groups, linear and non-linear projection methods . On the

    other hand, clustering techniques a re designed to reduce the amount of da ta by grouping a like items tog ether.

    The dimensionality of the dat a does not cha nge. The several clustering methods ca n be split into tw o common

    types , hierarchical and non-hierarchical clustering.

    3.2.1  Linea r projectionLinear projection methods use a linear combination of t he components of the original da ta to project the da ta

    onto a new co-ordinate system o f lower dimensionality using a fixed set of sca lar coefficients.

    Principal component analysis (PCA) is a commonly used linear projection method . The PCA technique tries to

    capture the intrinsic dimensionality of the da ta by finding the d irections in which the da ta displays the greates t

    variance. Often the dat a is s tretched in one or more

    directions and has an intrinsic lower dimensionality

    tha n it first may seem (see figure 3-2). These

    directions in the data are called ‘principal

    components ’. The first principal componentdescribes the direction of the largest variation in the

    da ta . The second principal component, orthogona l

    to the first, describes the direction of the second-

    largest variation in the dat a, et cetera. The variat ion

    in the da ta that has not been described by the first N

    principal components is called the residual variance.

    The da ta is projected onto a new co-ordinat e sys temspanned by the first tw o principal components, to g ive a more accurat e view of the da ta. A drawba ck of linear

    projection methods is t hat they ca n not t ake non-linear or a rbitrarily sha ped st ructures in the da ta into account,

    possibly leading to incorrect projections.

    In chapter 4, we compare the P CA technique with SOM. A full explanat ion of principal components can b e found

    in Johnson and Wichern15

    .

     

    15 Johnso n, R.A., a nd Wichern, D.W., 1992, chapter 8.

    Figur e 3-2 Two dimensional data st r et ched in one dir ect ion

  • 8/17/2019 ViscoVery SOM (clustering)

    37/173

    P r o j e c t io n a n d c l u s t e r i n g t e c h n i q u e s

    29

    3.2.2  Non-linear projectionSeveral techniques exist to project the non-linear structures in the

    da ta . They often focus on correctly displaying the differences

    between obs ervat ions in the original data space.

    Multi Dimensional Scaling (MDS)16

    , developed b y J.B. Kruskal d uring

    the sixties and seventies, actually denotes a whole range of

    techniques. It aims a t placing the original, high dimensional data

    points on a lower dimensional display in such a wa y tha t the relative

    rank ordering of similarity between observations in the input space

    is preserved a s much as possible. The new distance bet ween t he

    two least similar observations is largest, and vice versa the new distance betw een the two most similar observations is smallest.

    The specification of the s imilarity meas ure defines the specific used

    version of MDS; metric MDS uses Euclidea n dista nces17 in the input

    spa ce, non-metric MDS uses doma in specific relative rank orderings.

    One interesting application of non-metric MDS can be found in

    archaeology for the reconstruction of the geography of the

    Mycenaean kingdom of Pylos in Greece (circa 1200 BC)18. The found

    Palace archives (clay tablets) contain no direct geographical

    information, but relative distances between cities can be inferred

    from them. The MDS bas ed map of the kingdo m (figure 3-4)

    matches the map drawn by experts (figure 3-3) quite closely.

     

    16 Johnso n, R.A. a nd Wichern, D.W., 1992, pag es 602-608

    17 The Euclidea n dist ance ( ) y xd    , betwe en vectors x and y is defined as ( )   ( ) ( )   ( )

    2...

    2

    22

    2

    11,

    n y

    n x y x y x y xd    ++++++=

    18 See “ http://www.a rchaeology.usyd.edu.a u/~ myers/multidim.htm” for more information.

    Figur e 3-4 MDS map of Pyl os k ingdom

    Figur e 3-3 Exper t map of Pyl os kingdom

  • 8/17/2019 ViscoVery SOM (clustering)

    38/173

    3 s el f - o r g an i z i n g map s

    30

    3.2.3  Hierarchical clusteringHierarchical clustering techniques group data

    items according to some measure of similarity in a

    hierarchical fashion. They can be divided intosplitting a nd merging methods.

    Splitting methods   work top-down, starting with

    one big cluste r. At each step the cluster is divided

    into two separate clusters thereby maximizing

    some inter-cluster distance measure d . The

    divisional process is s topped when d  becomes too

    small. The found division of the da ta set isequivalent with a binary tree structure.

    Merging method s  work bottom-up, sta rting with each cas e in a sepa rate cluster. Clusters ha ving the lea st inter-

    cluster dista nce d  a re merged, often the Euclidea n dista nce is used for d . An example clustering of car brands is

    shown in figure 3-5.

    3.2.4  Non-hierarchical clusteringNon-hierarchical or partitional clustering methods try to directly divide the data into a set of disjoint clusters.

    This is done in such a wa y tha t the intra-cluster dist ance is minimized a nd the inter-cluster dista nce is

    maximized.

    K-means clustering is a non-hierarchical cluste ring method tha t is very much related t o Se lf-Organizing Maps. A

    set of K reference vectors is chosen with the sa me dimensiona lity as the input da ta . Then for ea ch reference

    vector a list is mad e of the obs ervations lying most closely to the reference vectors. The reference vectors a re

    then recomputed by taking the mean over the respect ive list . Each reference vector (also called ‘cent roid’) thus

    represents the centre of the cluster. This is repea ted until the reference vectors do not change much anymore.

    Figur e 3-5 Cl ust er ing car br ands using mer ging

  • 8/17/2019 ViscoVery SOM (clustering)

    39/173

    Cl a s s i f i c a t i o n t e c h n i q u e s

    31

    3.3 Class ificat ion techniques

    The techniques treated in this pa ragraph can a ll be used as classificat ion methods . Linear regression a nd neural

    networks are more general method s tha t can also be used to solve other kinds of problems. The ordered logit

    model is specifica lly used fo r clas sification problems. All three techniques a re used in chapter 5.

    3.3.1  Linear regress ionThe multiple linear regression model is used to s tudy the relat ionship between a dependent variable a nd

    severa l independent variables . The regression equa tion has the following form:

    iikkiii   xxxy   ε β β β    ++++=   !2211 , i = 1,…n ,

    where y   is the dependent or explained variable, x 1 ,…,x 

    k   are the independent or explanatory variables (also

    known as regressors), and i  indexes the n   sa mple observations. The disturba nce ε  is used to model external

    random influences tha t we can not capture with the model (e.g . errors of measurement). The coefficients o f the

    independent variables (β 1…β 

    k ) and the disturbance are most often estimated using the Ordinary Least Squares

    technique. Before we do this a number of a ssumptions ha ve to be sa tisfied concerning amongst others the

    dependencies betw een variables a nd the dis tribution of the dis turbances. A full overview of the multiple linear

    regress ion model is given in Greene19

    .

    3.3.2  Ordered logitThe ordered logit model is a so ca lled ordered response model. It is a n extension of the binary logit model,

    which is a regression-bas ed technique: A latent variable is ass umed to be the d etermining factor for clas s

    membership. This la tent variable is linea rly depend ent on several regres sors and a disturba nce.

    iikkiii   xxxy   ε β β β    ++++=   !2211 , i = 1,…n 

    We as sume a log istic distribution for the disturbance ε , hence the name ordered logit . Although the classeshave to be ordered they need not be of eq ual width. The classification is seen as a t ransformat ion of the latent

    variable a nd derived from y using

    1cx i  ∈  if 1α ≤iy

     ji   cx   ∈  if  ji j   y   α α    ≤

  • 8/17/2019 ViscoVery SOM (clustering)

    40/173

    3 s el f - o r g an i z i n g map s

    32

    mi   cx   ∈  if im   y

  • 8/17/2019 ViscoVery SOM (clustering)

    41/173

    S e l f O rg a n i z i ng Ma p s

    33

    3.4 Self Orga nizing Maps

    3.4.1  IntroductionThe se lf-orga nizing map (SOM) is a combination of a clustering and projection a lgorithm at t he sa me time,driven by a neural netwo rk. The multi-dimensional input (e.g. compa nies with multiple financia l ratios per

    company) is projected   onto a 2-dimensional map, thereby preserving the local distances between the

    observations. The projected observa tions are subseq uently merged into clusters , ta king the placement on the

    map into a ccount.

    The model of the se lf orga nizing map was inspired by the huma n brain: The complex motoric and se nsoric

    control of specific parts of the human body can be pinpointed to specific areas on a flat surface of the brain.More complex functions a re appointed la rger area s (or cluste rs) of brain tiss ue. The resulting man-like sha pe

    projected on the brain is known a s t he homunculus (figure 3-7).

    Figur e 3-7 Pictur e of t he homuncul us in t he br ain, dr awn by Wil der Penf iel d

  • 8/17/2019 ViscoVery SOM (clustering)

    42/173

    3 s el f - o r g an i z i n g map s

    34

    3.4.2  Overview The self-orga nizing map algorithm involves two steps . The first st ep projects the obs ervations , the second s tep

    clusters the projected observations.

    Projection

    The first st ep of the a lgorithm involves projecting the obs ervations onto a 2 dimensional, flexible grid composed

    of neurons or nodes. The grid is stretched and b ended through the input space to form an a s go od a s poss ible

    representa tion of the dat a . The projection on this grid is a generalizat ion of simple projection (on the flat

    surface) and projection using Principal Component Analysis (PCA).

    Simple projection simply projects the dat apoints on the flat s urface defined by the x and y axes. Projection

    using principal components is more advanced than simple projection (reflecting the intrinsic dimensionality of

    the da ta), but is st ill limited be cause t he obse rvations a re projected o n a flat plane. The flat plane is a ligned

    according to the a xes defined by the two directions inhibiting the la rgest va riance of the da ta . The projection

    part of the SOM algorithm (also known as the self-organization process) can be thought of as a non-linear

    generalization of PCA21

    . The plane onto which the obse rvat ions are projected ca n stretch and bend through the

    input space thus more thoroughly capturing the distribution of the observations in the input space.

    The first two types of projections a re often too rest ricted to fully capture the irregularities of the da ta . The three

    dimensional example in Figure 3-7shows t his more clearly. The da ta is clustered in three distinct segments of

    the cube, simple projection projects the ob servat ions on the bot tom of this cube (left picture). The flat pla ne

    show n in the middle picture is aligned a long the first tw o principal components o f the da ta . A projection on this

    surface g ives a better representa tion of relative dista nces in the data set. The rightmost picture show s the

    flexible, bended a nd stret ched grid used for SOM projection. By following the form of the da ta an even more

     

    21 Kaski, S., 1997.

    Fi ure3 -7Pl aneo f ro ect i on usi n theX-Y lane usi n PCA and usin SOM

  • 8/17/2019 ViscoVery SOM (clustering)

    43/173

    S e l f O rg a n i z i ng Ma p s

    35

    accurate representa tion of relat ive dista nces in the da ta set is given. How the SOM achieves this projection is

    extensively treated in paragraph 3.5.

    Clustering

    The flexible g rid, ont o wh ich the obs ervations have b een projected, is (for convenient output viewing) returned

    to a normal, unstretched flat plane and displayed a s the map. The form of the g rid in the input spa ce remains

    fixed. The local ordering of the sample is preserved; ne ighbouring observat ions in the input spa ce will be

    neighbouring observations on t he map.

    A bott om-up clustering method is used to cluste r the projected obs ervations: sta rting with each observat ion in

    a separate cluster, 2 clusters are merged if their relative distance (e.g. Euclidean distance) in the input space is

    smallest and if they are adjacent in the map. The number of show n clusters varies with the specific step of the

    algorithm we wa nt to see. One step later in the algorithm means one less cluster shown (another cluster has

    merged), one step earlier means one more cluster shown.

    Cluster are clear separations of the input space, so observations can only be member of one cluster (the clusters

    do not overlap). The clustering algorithm is discuss ed in parag raph 3.6.

  • 8/17/2019 ViscoVery SOM (clustering)

    44/173

  • 8/17/2019 ViscoVery SOM (clustering)

    45/173

  • 8/17/2019 ViscoVery SOM (clustering)

    46/173

    3 5 3 Mathematical description

  • 8/17/2019 ViscoVery SOM (clustering)

    47/173

    S O M p ro j e c t i on

    39

    3.5.3  Mathematical descriptionThe self-orga nization process ca n be described in mathema tical form. The input consist s of a s ample of n-

    dimensional observations

    ( ) ( ) ( ) ( )[ ]txtxtxtx n,...,, 21=  ,

    where t is regarded a s the index of the observat ions in the sa mple (t = 1, 2,..., T ).

    The goal of the a lgorithm is to d etermine the va lues for a s et of n-dimensiona l neurons,

    ( ) ( ) ( ) ( )[ ]TmTmTmTm iniii ,...,, 21=  ,

    where the i  denotes the index of the current neuron in the output map ( i = 1, 2, ..., I ). The neurons are first

    initialized to a rbitrary values . The placement of the neurons in the output map is fixed, so the index i  does not

    change.

    For every t , the algorithm performs the following steps:

    1.  The winning neuron mc (t) most closely resembling the current observation x(t)  is selected (c  denotes the

    winning a nd i  denotes the current neuron):

    ( ) ( ) ( ) ( ){ }tmtxtmtx ii

    c   −=− min  .

    2.  The mi  a re updated:

    ( ) ( ) ( ) ( ) ( ) ( )[ ]tmtxthttmtm iciii   −+=+   α 1  .

    The ad justment is monotonically decreas ing as t he number of itera tions increases . This is controlled by the

    learning rate factor α (t) ( 0

  • 8/17/2019 ViscoVery SOM (clustering)

    48/173

    3 s el f - o r g an i z i n g map s

    40

    p ( p g p ) p g

    the train process while keeping the sa me results . For more information on the bat ch train process plea se refer

    to Deboeck22

    .

    3.5.4  A three dimensional exa mpleAn example using a three dimensional input space is more representa tive of a rea l world application of the SOM:

    A high-dimensional input spa ce mapped to a two dimensiona l output grid. In Figure 3-11 the neurons a re placed

    in a t hree dimensional input space with three groups of da ta . Please note tha t the network is not random but

    linearly initialized a ccording to the first two principal components of t he da ta set.

    The dis tribution of the neurons a fter the self-orga nization process is shown in Figure 3-12. The network, still a 2

    dimensional lat tice, has curved and stretched to form an a s good as possible fit to the original da ta. The

    neurons a re concentrate d in those area s of the input space conta ining the most observations. The largest

    separation occurs between the cluster of observations in the bottom half of the cube and the two clusters of

    observations in the upper half of the cube.

     

    22

     Deboe ck, G., 1998, pag e 167.

    Figur e 3-11 Linear l y init ial ized net wor k in a 3D input space Figur e 3-12 Dist r ibuti on of t he neur ons aft er sel f-or ganizat ion

    3 6 SOM i li ti d l t i g

  • 8/17/2019 ViscoVery SOM (clustering)

    49/173

    S O M v i s u a l i z a t i o n a n d c l u s t e r i n g

    41

    3.6 SOM visualiza tion and clustering

    The previous t reatment of the inner workings of S OM are g eneric for mos t implementa tions, b ut the ava ilable

    visualizations of the final map vary for each softwa re packag e. We ha ve mad e use of the Viscovery SOMine 3.0

    Enterprise edition program, generously supplied to us by Eudaptic s in Austria23

    . Some of the shown

    visualization and cluster capabilities can not be found in other programs24

    .

    3.6.1  MapsThe visible output of the a lgorithm consist s of the map, which is an unstretched , flatt ened representa tion of the

    grid in the input space. Observations mapped to a specific neuron in the input space a ppear on the sa me

    specific neuron (grid point) in the map. Neighbouring observat ions in the input spa ce are neighbouring

    observations on the map.

    The map has s everal manifesta tions:

    -  Clusters: to view the clustering of neurons25

    .

    -  U-matrix: to view relat ive dista nces between neurons (in the input space).

    -  Component planes: to view d istributions of separat e variables over the map.

    It is important to remember that for ea ch map manifestat ion the distribution of obs ervations over the map doe s

    not change. We are looking at the same map, but each time dif ferent info rmation is shown.

    Unified dista nce matrix

    The Unified dist ance matrix (U-mat rix) can be used

    to assess relative distances between neurons in

    the input spa ce. When transla ting the grid in the

    input space to the output map, distanceinformation is lost (the grid is returned to an

    unstretched , flatt ened sta te). This informat ion is

    re-introduced by colour coding the map. Greater

     

    23 Euda ptics, 1999.

    24 In addition to this, the intuitive interface and the a bility to work with Excel files make it an a ttractive package.

    25

     The clusters a nd specific clustering algorithms will be treate d in para graph 3.6.3.

    Figur e 3-13 U-matr ix

    differences between the neurons in the input spa ce translat e to darker colours in the ma p.

  • 8/17/2019 ViscoVery SOM (clustering)

    50/173

    3 s el f - o r g an i z i n g map s

    42

    The U-mat rix for the ea rlier used three dimens iona l example is show n in figure 3-13. The implicit  clustering is

    visible as groups of neurons ha ving a lmost equa l colour separated by nodes with distinctly different colours. In

    this U-mat rix one very clear cluster at the right of the map can be found. The two clusters a t the left ha lf,separat ed in the middle, a re less clear. This ag rees with the placement of the three clusters of obs ervations, as

    can be checked in figure 3-12.

    Component pla nes

    A component plane is a ma nifesta tion of the map whe reby the values for only one of the variables (a

    component) are shown. In this wa y the distribution of this separat e variable over the map ca n easily be

    inspected. When comparing two different component planes of the sa me map highly correlat ed variables w ould

    sta nd out beca use of the likeliness of their component planes. Components not contributing much to the

    distribution of the observations show a more random pattern in their component planes, they are only

    contributing noise to the clustering.

    Often a display o f the U-mat rix surrounded b y the component planes of a ll the variables is crea ted . Figure 3-14

    show s such a d isplay for our three-dimensiona l example. The three component planes represe nt the X, Y and Z

    variables.

  • 8/17/2019 ViscoVery SOM (clustering)

    51/173

    S O M v i s u a l i z a t i o n a n d c l u s t e r i n g

    43

    Figur e 3-14 U-matr ix and component pl anes for all t hr ee var iables

    The display shows t ha t no two variables a re highly correlat ed. The right cluste r is characterized by small va lues

    for all variables. The top-left cluster is characterized by high va lues for X and Z, the bo ttom-left cluster displa ys

    high values for Y and Z. This a lso agrees with the pla cement of the cluste rs of obs ervations in Figure X.

    3.6.2  Map qualityWe can discern two types o f map qua lity:

    -  The da ta representa tion accuracy.

    -  The da ta set topology representa tion accuracy.

    Both make use of the ‘Best Matching Unit’ concept.

    Figur e 3-15 Best matching unit f or vector [2, 0, 1]

  • 8/17/2019 ViscoVery SOM (clustering)

    52/173

    Dat a s et topology representation

  • 8/17/2019 ViscoVery SOM (clustering)

    53/173

    S O M v i s u a l i z a t i o n a n d c l u s t e r i n g

    45

    The da ta set topology representa tion accuracy

    can be meas ured in several ways. One error

    function often used is the topographic error

    meas ure: The percentage of first and second

    best ma tching units of a sa mple vector that a re

    not ad jacent to ea ch other. This also mea sures

    the smoothness of t he mapping.

    A more visua l tool for evalua ting the da ta se t

    topology representation accuracy is the

    frequency map. This manifes ta tion of the map

    displays the number of matched observations

    per neuron (a darker colour means more

    matched neurons). A goo d map should show equa lly distributed frequencies on the freq uency map (Figure 3-

    17).

    3.6.3  ClustersIt is left to the user to find any clustering of observations ba sed on the U-mat rix and the component planes. This

    so-called implicit clustering can be complemented w ith other cluste ring techniques to find a n explicit clustering.

    Most software implementations of the Self-Organizing Map do not incorporate any explicit clustering

    alg orithms. The Viscovery SOMine packag e includes up to three d ifferent clustering methods.

    The clustering a lgorithm frees t he user from the difficult ta sk of identifying clusters in the U-matrix. However,

    by alte ring para meters of the clustering algorithm the number of show n cluste rs may vary. The user still has to

    select the most a deq uate clustering ba sed on a ll available information.

    The three clustering metho ds implemented in Viscovery SOMine a re Ward's clustering, S OM single linkag e a nd

    a combination of these tw o, called SOM-Ward. Instea d of directly cluste ring the original observat ions thes e

    algorithms perform a clustering on the neurons (grid points) in the map, on which the observations are

    projected . As thes e neurons form 'best representat ions' for the observations in the input space there is no

    qua litative difference. The clustering of the observations ca n be found by retrieving the projected ob servat ions

    for each neuron in each cluster.

    Figur e 3-17 Fr equency map

    Distance measure

    Two of the implemented clustering a lgorithms make use of a specific distance meas ure called the Ward

  • 8/17/2019 ViscoVery SOM (clustering)

    54/173

    3 s el f - o r g an i z i n g map s

    46

    Two of the implemented clustering a lgorithms make use of a specific distance meas ure, called the Ward

    distance. It is defined as:

    Ward d ista nce

    2

    ,   yxyx

    yxyx   meanmeannn

    nn

    d   −⋅+

    ⋅=

    where x   and y   are clusters, xn is the number of neurons in cluster x   and

    xmean is the vector with averages over all components of the neurons in

    cluster x , also known as the cluster centroid. Dista nces betwe en clusters with

    an evenly distributed number of neurons are enlarged in comparison with

    distances between clusters with an uneven distribution of the numbers of

    neurons (see ta ble 3-1). This accelera tes t he merging of stray s mall cluste rs.

    Ward's clustering

    This is one of the clas sic bottom-up methods . It sta rts with a ll the neurons in a s eparat e cluster, in each s tep

    merging the clusters ha ving the least Ward dista nce. This dista nce is calculated w ithout ta king the ordering of

    the map into account, only dista nces betw een neurons in the input space a re used . When the found clustering

    is shown on the map, the clusters may a ppear disconnected: In the input space the neurons are close-by

    wa rranting the inclusion in one cluster, but the grid may be bended through the input space in such a wa y tha t

    the neurons a re fa r apart on the map.

    SOM single l inkage

    This clustering method concentrates on the o rdering of the neurons on the ma p. For each neuron the dist ance

    with it's neighbour is calculated, when this dista nce exceeds a certa in threshold a s eparat or is s et bet ween the

    neurons in the grid. If the separators form a closed loop the neurons w ithin the loop are marked as a cluster.

    Because the forming of the clusters only depends on the smallest possible distances between clusters this

    clustering method is known a s a single linkag e method.

    SOM-Ward

    This clustering method is ess entia lly the sa me as Ward 's cluste ring, but this time the ordering of the neurons on

    the map is ta ken into account. Only clusters that are direct neighbours in the map can be merged togethe r to

    form a la rger cluster. The SOM-Ward clustering technique is primarily used in our res ea rch. An example of

    SOM-Ward clustering (using the sa me 3 dimensiona l da ta set ) is show n in figure 3-18.

    Table 3-1 War d dist ances fordiff er ent cl uster sizes

    xn yn

    yx

    yx

    nn

    nn

    +

    1 10 0 .91

    2 9 1 .64

    3 8 2 .18

    4 7 2 .55

    5 6 2 .73

  • 8/17/2019 ViscoVery SOM (clustering)

    55/173

    Number of neurons

    One of the main settings to choose when training a map is

  • 8/17/2019 ViscoVery SOM (clustering)

    56/173

    3 s el f - o r g an i z i n g map s

    48

    g g p

    the number of output neurons.

    A small number of neurons (smaller tha n the to ta l number of

    observations in the train set) means a more general fit is

    made. The map is better a t generalizing a nd is less sensitive

    to noise in the da ta . Figure 3-19 shows the underlying

    function ( y = sin(x) ), the t rain da ta with some uniform

    distributed random noise added, and a 1-dimensional 5

    neuron grid.

    A large number of neurons (larger tha n the to ta l number of

    observations in the train set) means a more precise fit is

    made, but the map is more sensitive to noise in the data.

    The neurons d o not precisely ma tch the original

    observations, but almost all observations are mapped to

    separate neurons. Figure 3-20 shows the same data, now 

    with a 20-neuron grid.

    Clearly, the fit of the netw ork to the original da ta is better in

    this second case, but the error in respect to the underlying

    function is a lso greate r. Notice that the network is not

    completely ‘attracted’ to outliers, due to the learning rate

    facto r and the neighbourhood function. Although the

    network has more neurons it still is a fairly good generalizer

    for the underlying function. Compare this to polynomial

    fitting; higher order polynomials often lead to large errors!

    The number of neurons should be chos en in proportion to the trust one pla ces in his or her da ta : If a lot of noise

    is to be expected , then a relatively sma ll number of neurons should be chosen. If the dist ribution of the sample

    data very closely resembles the underlying distribution of the population, then a relatively large number of

    neurons can be initialized. The extra neurons then warrant a more refined representa tion of the data by the

    network.

    Figur e 3-19 Fit t ing a 5 neur on net wor k t odat apoint s wit h under l ying funct ion y = sin(x)

    Figur e 3-20 Fit t ing a 20 neur on net wor k t odat apoint s wit h under l ying funct ion y = sin(x)

    Initialization

    Instea d of random initialization one often uses linear initializat ion. Both ca n be used , but linear initialization

  • 8/17/2019 ViscoVery SOM (clustering)

    57/173

    S O M v i s u a l i z a t i o n a n d c l u s t e r i n g

    49

    provides a bett er sta rting point for the orga nization of the map. The map is often linear initia lized along the

    axes provided by the first two principal components of the d at a set.

    Choice of learning rate factor and neig hbourhood function

    The lea rning ra te factor α (t)  is normally a linearly decreasing function over the iterations, but can also be

    specified a s a n inverse-time function:

    ( )( )tB A

    t+

    =α   ,

    where A  and B   are const ants . Earlier and later sa mples will now be ta ken into account with approximately

    similar average weights26.

    The neighbourhood function often has the Ga uss ian form

    ( )( )

    −−=

    2

    2

    2exp

    t

    r r th

      ji

    ijσ 

     ,

    where r i

    denotes the place of this neuron in the map a nd σ (t)  is some monotonically decreasing function over

    the iterat ions. Somet imes a simpler form of the neighbourhood function is used , e.g. the bubb le function whichjust denote s a fixed set of neurons a round the winning neuron (in the map). The Gaus sian form ensures a global

    best ordering of the map (the quantization error arrives at a global minimum instead of a local minimum) 27

    .

     

    26 Kohone n, T., 1997, pa ge 117.

    27

     Kohone n, T., 1997, pa ge 118.

    3.7 SOM interpreta tion and evalua tion

  • 8/17/2019 ViscoVery SOM (clustering)

    58/173

    3 s el f - o r g an i z i n g map s

    50

    In the knowledge discovery process t he SOM maps a re mainly used for two reas ons: describing the da ta set and

    predicting values for certain as pects of the data . Each of these applications demands a specific wa y ofevaluating a nd interpreting t he map.

    3.7.1  DescriptionWhen a map ha s been created the us er has to e valuate the map, de termine a good clustering a nd possibly

    improve on the clustering so tha t a clear understa nding of the underlying da ta set e merges.

    Determining a good clustering is a non-trivial ta sk. Of course the va riab les used for map creation have to be

    suitable for the resea rch se tting. Then each specific setting for the used clustering algorithm renders a differentnumber of clusters visible. The map quality meas ures and the q uant itat ive cluster quality measure form a

    sta rting point for determining a good clustering. It is up to the expert user to choose a clustering s uitable for

    the ta sk at hand, s pecifically by ta king a ny domain knowledge into a ccount.

    Improving the clustering

    Often one tries to improve on the results (clustering or readability of the display) by reducing the number of

    variables used in the crea tion of the map. Removing a variable is warranted only under certa in conditions, if

    these conditions hold then the variable does not contribute much to the generated map and can safely be

    removed:

    -  With or without t he variable the d istribution of the companies over the map remains equal.

    -  With or without the variable the clustering remains the sa me (sa me size and same characteristics in terms

    of individua l variables).

    Two s trong visual clues lead us to these kinds of variables:

    -  The component plane of the variable shows a random distri bution (Figure 3-21).  The component only ad ds

    noise to the formation of the map, it does not contribute to the dis tribution of companies over the map. For

    instance, this could happen when the variance of the normalized variable is significantly lower than the

    variance of the other normalized variables.

    -  The component plane of the variable bears a close resemblance with the component plane of anot her 

    variable (Figure 3-21).  The variables a re then highly correlat ed (not necess arily in a linear fas hion). The

    dependent variable doe s not contribute to the distribution of companies over the map, beca use the s ame

    information is already contained in the o ther variable.

  • 8/17/2019 ViscoVery SOM (clustering)

    59/173

    S O M i n t e r pr e t a t i o n a n d e v a l u a t i o n

    51

     A less s trong visual clue a lso leads us to spurious variables:

    -  The distribution o f the high and low values of the component plane does not coincide with one or more 

    specific clusters (Figure 3-22).  A strong cha racterization of the clusters (regarding this variable) can not be

    given. It is most likely that the variable does not contribute to the clustering, so we choose to remove the

    variable.

    Figur e 3-22 Dist r ibuti on of var iabl e does not coincide with cl uster ing

    Examples

    In appendix III and IV two examples of descriptive SOM use can be found, one on a medical domain and the

    second on a da ta bas ed marketing domain. Chapter 4 also uses S OM in a descriptive wa y to evaluate the link

    between credit ratings a nd financial ratios.

     

    Figur e 3-21 Random and highl y cor r el ated component pl anes

  • 8/17/2019 ViscoVery SOM (clustering)

    60/173

    a relat ively short time spa n. When using all the variab les for map creation, and then subsequently removing

    variables not contributing much to the prediction power, we can be certain that all contributing combinations

  • 8/17/2019 ViscoVery SOM (clustering)

    61/173

    S O M i n t e r pr e t a t i o n a n d e v a l u a t i o n

    53

    are found. Unfortuna tely this strategy is more time consuming.

    Using ta rget variable a s a train variableFor classification purposes most often multi-layered ba ckpropaga tion networks are used . For these netw orks it

    is possible to train the network based on t he train variables and   the target variable. For each observation the

    sta te of the train va riab les is shown to the network. The network gives a prediction for class membership, and

    this prediction is compared w ith the real clas s membership (the target variable), leading to ad justments in the

    network to account for any deviat ions (the ba ckpropaga tion step). This is also known as supervised t raining,

    the network ada pts to bet ter distinguish the differences betw een the classes the observations can belong to.

    For the SOM as a feed forward netw ork, it is not pos sible to directly match the real value of the t arget variablewith the predicted value of the target value. But we can simulate it by  using the target variable as a train

    variable during map creation, this is known as semi-supervised training28

    . How this can be beneficial to a

    distinction betwe en observat ions in different cluste rs is illustrate d in the following figures. Without using the

    target variable as a train variable, the map in figure 3-23 (consisting of just two neurons) is created using only 1

    variable or 1 dimension. A distinction between the observations is difficult to make, it is hard to see t o which

    Figur e 3-23 SOM net wor k when onl y 1-dimensional (x-axis)infor mation about t he dat apoint s (r ed plusses) is avail abl e.The best matching neur on f or t he new obser vati on (gr eenstar isd i f f i cul t to measure.

    neuron the new observation (green) is matched in the one-dimensional final map (the distance to either neuron

    is equa l).

  • 8/17/2019 ViscoVery SOM (clustering)

    62/173

    3 s el f - o r g an i z i n g map s

    54

    When using the ta rget variable a s a t rain variable, the ma p is crea ted us ing two d imensions (figure 3-24). The

    placement of the neurons shifts, it is much clearer that the new observation matches the rightmost neuron.

    Remember that we do not have the value of the ta rget variable for the new observation, so w e ca n still only use

    the x-dimension to determine the best matching unit for this new observation.

    Of course t his particular example only illustrat es o ne possible outcome of using t he ta rget variable a s a train

    variable. A deeper investigation into the effects of this technique lies outside the scope of this thes is.

    ExamplesAn example of the use of SOM as a prediction model can be found in chapter 5: Financial ratios a re used t o

    clas sify companies according to creditworthiness.

     

    28

     Kohonen, T., 1997.

    Figur e 3-24 SOM net wor k when tar get var iable (y-axis) is alsoused when t r aining the net wor k. The best matching neur on fo rt he new obser vati on (gr een st ar ) is easy t o measure.

    3.8  SOM quest ions a nd answers

  • 8/17/2019 ViscoVery SOM (clustering)

    63/173

    S O M q u e s t i o n s a n d a n s w e r s

    55

    Q: Is it a neural network? 

    A: Yes, b ut a very special one; a feed forwa rd neural netw ork with no hidden layers. The inner workings of the

    SOM are relatively simple (see paragraph 3.5) and therefore much clearer than for networks using multiple

    layers and ba ckpropaga tion.

    Q: Is it a blackbox? 

    A: No, the SOM is nothing more than the projection on a non-linear plane drawn through the obs ervations. The

    form of the plane is set using a very strict and clear algorithm, and the form of the plane is fixed after the

    a lgorithm has completed. The component planes g ive us insight into the contribution of individual variables t othe clustering. Other neural nets use multiple layers and ba ckpropaga tion, making the inner workings of the

    network more difficult to comprehend.

    Q: How can the neural network be flattened and unstretched for output viewing (the map) but still keep the 

    fixed form in t he input space (fixed after completing the algorithm)? 

    A: It is not really the grid in the input spa ce that is flattened a nd unstretched, rather a direct representa tion of

    this grid in 2 dimensions. Each neuron in the input space directly correspond s with a grid point in the 2

    dimensional map.

    Q: Is there a chance of overfitti ng the neural network when using a large number of neurons (larger than the 

    number of observations)? 

    A: This depends on your definition of overfitting. The SOM alg orithm includes a utomat ic 'da mpening' functions

    in the form of the lea rning rate fa ctor and the neighbourhood function. When using a la rge number of neurons

    the network more precisely represents the underlying data set , some would consider this overfitting. Howe ver,

    thanks to the da mpening functions t he neurons are not completely attracted by the specific observations.

    Q: Does the order in wh ich the observations are being processed by the self-organization process make any 

    difference for the final results? 

    A: No, becaus e instea d of processing the obs ervations just once, often multiple itera tions are used . Toge ther

    with the used dampening functions the map converges to a sta ble form.

    Q: What is the stati stical significance of results found with SOM? 

    A: The SOM can be used in two w ays, (1) to give an accurate d escription of the dat a s et, a nd (2) to predict

  • 8/17/2019 ViscoVery SOM (clustering)

    64/173

    3 s el f - o r g an i z i n g map s

    56

    values for one or more variab les. For descriptive use severa l SOM and cluster qua lity mea sures exist (see

    paragraph 3.6), but (like other visualization techniques) no g eneral sta tistical ‘goodness ’ indicat or exists .

    For predictive use we should see the SOM as a form of non-linear regression, w ithout a presupposed form of the

    fitted function. Beca use of the non-linearity of the model the direct contributions of the individual variables a re

    difficult to as sess . The tota l performance of the model can be measured a nd validated using common sta tistical

    techniques.

    3.9 Summary

  • 8/17/2019 ViscoVery SOM (clustering)

    65/173

    S u m m a r y

    57

    Chapter 3 covered the theo retical founda tions of SOM. We viewed the place of Self-Organizing Maps in the

    knowledge discovery process, and we described some projection, clustering and classification techniquesrelated to S OM. The SOM is a combination of non-linear projection a nd hierarchical clustering, d riven by a

    simple feed forwa rd neural network. The observat ions are projected on a flexible grid of neurons tha t stret ches

    and bends to accommodate to the d istribution of the da ta in the input space. After the network has found its

    final form, it is displayed in a flattened st at e as a ma p. The observations projected on this map a re then

    clustered, according to similarity of the used variables.

    A Self-Organizing Map can be used in two wa ys: As a de scriptive ana lysis tool, and as a prediction model. For

    use in a des criptive set ting the map display and the clustering is most important. Visually comparing theclusters a nd other parts o f the SOM display provides a good and insightful overview of the underlying da ta set.

    When deploying the SOM as a prediction model, we are more interest ed in the d istribution of the companies

    over the map (or equivalently, the form of the map) than the clustering. The SOM then functions as a semi-

    para metric (poss ibly non-linear) regres sion model.

  • 8/17/2019 ViscoVery SOM (clustering)

    66/173

    4 descr i pt i ve anal ysi s

  • 8/17/2019 ViscoVery SOM (clustering)

    67/173

    The paragraphs in chapter 4 form an account of our descript ive analysis, using the SOM as a visual explorat ion 

    tool. We answer question 3 and 4 from the introduction: 

    3.  Is it possible to f ind a log ical clustering of the companies, based on the financial statements of these 

    companies? 

    4.  If such a clustering is found , does this clustering coincide with levels of creditworth iness of the companies 

    in  a cluster? 

    Paragraph 1 covers the basic data analysis. Paragraph 2 explo res the possibili ty of clustering companies based 

    on financial data. In paragraph 3 we then compare the found clustering wit h the credit ratings of the clustered 

    companies. Paragraph 4 reviews the performed sensitivi ty analysis and in paragraph 5 we benchmark the SOM 

    results to a principal components analysis.

    4.1 Bas ic data a nalysis

  • 8/17/2019 ViscoVery SOM (clustering)

    68/173

    4 des cr i p t i v e an al y s i s

    60

    Our basic data analysis comprises the first three steps of the knowledge discovery process, namely data

    selection, da ta pre-processing a nd da ta transformat ion.

    4.1.1  Data selectionThe