statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot...
TRANSCRIPT
![Page 1: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/1.jpg)
An R tutorial on statistical, naıve and intuitive
predictors in credit risk classification
Rodolfo Vanzini — Bologna.
September 15, 2015
By quoting extensively the extraordinary work of Kahneman (2011) I’d like toexplain the rationale of this work on credit risk classification during trainingsessions in traditional classroom settings – like the ones that I have conductedin the last three years – for loan officers employed by banks.
Why are experts so inferior to algorithms? One reason [. . . ] is thatexperts try to be clever, think outside the box, and consider combi-nations of features in making their predictions. [. . . ] Complexity maywork in the odd case, but more often than not it reduces validity.(Kahneman 2011, page 224)
Another reason for the inferiority of expert judgment is that hu-mans are incorrigibly inconsistent in making summary judgments ofcomplex information. When asked to evaluate the same informationtwice, they frequently give different answers. (Kahneman 2011, page224)
The research suggests a surprising conclusion: to maximize predictiveaccuracy, final decision should be left to formulas, especially in low-validity environments. (Kahneman 2011, page 225)
[Dawes] observed that the complex statistical algorithm adds littleor no value. One can do just as well by selecting a set of scoresthat have some validity for predicting the outcome and ajusting thevalues to make them comparable [. . . ] it is possible to develop usefulalgorithms without any prior statistical research. Simple equallyweighted formulas based on existing statistics or on common senseare often very good predictors of significant outcomes. [. . . ] Theimportant conclusion of this research is that an algorithm that isconstructed on the back of an envelope is often good enough tocompete with an optimally weighted formula, and certainly goodenough to outdo expert judgment. This logic can be applied tomany domains, ranging from the selection of stocks by portfoliomanagers to the choices of medical treatments by doctors or patients.
1
![Page 2: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/2.jpg)
(Kahneman 2011, page 226)
Whenever we can replace human judgment by a formula, we shouldat least consider it. (Kahneman 2011, page 233)
Data are generated according to a desired range of features the sample data setwill have to have in terms of default frequency, degree of overlapping betweennon-defaulted and defaulted companies, and key financial ratios as predictors.The code below generates two predictors: debt-to-book-value ratio (DMP, fromthe original Italian debito-mezzi-propri) and EBIT-to-interest-payments ratio(EBITOF, from the original Italian EBIT-oneri-finanziari)
n <- 1000
p <- 0.82 #proportion of non defaulters
set.seed(321) #for reproducibility
DMP.no <- rnorm(n = n * p, mean = 1.5, sd = 0.75)
DMP.si <- rnorm(n = n * (1 - p), mean = 3.0, sd = 0.75)
EBITOF.no <- rnorm(n = n * p, mean = 2.0, sd = 0.75)
EBITOF.si <- rnorm(n = n * (1 - p), mean = 0.75, sd = 0.75)
df <- data.frame(Default = c(rep('No', p * n),
rep('Si', (1 - p) * n)),
DMP = c(DMP.no, DMP.si),
EBITOF = c(EBITOF.no, EBITOF.si)
)
str(df)
## 'data.frame': 1000 obs. of 3 variables:
## $ Default: Factor w/ 2 levels "No","Si": 1 1 1 1 1 1 1 1 1 1 ...
## $ DMP : num 2.779 0.966 1.292 1.41 1.407 ...
## $ EBITOF : num 0.878 1.036 3.052 2.021 3.426 ...
#Adjust DMP for negative values
d <- df$DMP
d[df$DMP < 0] <- 0
df$DMP <- d
head(df)
## Default DMP EBITOF
## 1 No 2.7786774 0.8777016
## 2 No 0.9659711 1.0358616
## 3 No 1.2915113 3.0520025
## 4 No 1.4102632 2.0214981
## 5 No 1.4070295 3.4255801
## 6 No 1.7011378 1.1667068
RColorBrewer is loaded to set a palette with four colors to be used in enhancegraphics.
require(RColorBrewer)
pal <- brewer.pal(4, "Set1")
pal
## [1] "#E41A1C" "#377EB8" "#4DAF4A" "#984EA3"
2
![Page 3: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/3.jpg)
After setting up the palette, it’s necessary to plot the sample to inspect it forpossible irregularities with respect to the desired features. To do so the ggplot2package is required and the scatter plot options are adjusted so as to get a plotsimilar to base R.
require(ggplot2)
pl <- ggplot(df, aes(DMP, EBITOF, color = Default, shape = Default)) +
geom_point(size = 3, alpha = 0.5) +
scale_shape(solid = FALSE) +
scale_color_manual(values = c(pal[2], pal[1])) +
scale_shape_manual(values = c(1,3))
pl
0
2
4
0 1 2 3 4 5DMP
EB
ITO
F Default
No
Si
Overlapping histograms of non defaulted/defaulted companies of both key finan-cial ratios.
pl.1 <- ggplot(df, aes(DMP, fill = Default)) +
scale_fill_manual(values = c(pal[2], pal[1]))
pl.2<-ggplot(df, aes(EBITOF, fill = Default)) +
scale_fill_manual(values = c(pal[2], pal[1]))
pl.1 + geom_histogram(data = subset(df, Default = "Si"),
binwidth = 0.1,
alpha = 0.5,
position = 'identity') +
geom_histogram(data = subset(df, Default = "No"),
binwidth = 0.1,
alpha = 0.5,
position = 'identity')
3
![Page 4: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/4.jpg)
0
10
20
30
40
50
0 1 2 3 4 5DMP
coun
t Default
No
Si
pl.2 + geom_histogram(data = subset(df, Default = "Si"),
binwidth=0.1, alpha = 0.5, position = 'identity') +
geom_histogram(data = subset(df, Default = "No"),
binwidth=0.1, alpha = 0.5, position = 'identity')
0
10
20
30
40
50
0 2 4EBITOF
coun
t Default
No
Si
Let’s generate some false predictors and sample a reasonable subset to be handedto students:
# subset data
set.seed(666)
s.s <- 200
require(dplyr)
4
![Page 5: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/5.jpg)
df.sm <- sample_n(df, size = s.s, replace = FALSE)
df.sm$SaleChg <- rnorm(n = s.s,
mean = 0.0,
sd = 10.0)
df.sm$ClienteStorico<- sample(c('Si', 'No'),
size = s.s,
replace = TRUE)
df.sm$Settore <- sample(c("Meccanica industriale",
"Servizi",
"Meccanica automotive",
"Dettaglio",
"Edile"),
size = s.s,
replace = TRUE)
df.sm$Outlook <- sample(c('Positive', 'Stable', 'Negative'),
size = s.s,
replace = TRUE)
set.seed(1)
z <- sort(sample(nrow(df.sm), nrow(df.sm) * 0.5))
train <- df.sm[z,]
test <- df.sm[-z,]
Both train and test samples will be duplicated in a second data frame trainnand testt just in case they are needed (seeding will be set again to new valuesto generate new random samples).
trainn <- train
testt <- test
Before contintuing data will be saved to produce hand-outs for students.
write.table(train, file = "train_100.csv",
sep = ";",
dec = ",")
write.table(test, file = "test_100.csv",
sep = ";",
dec = ",")
Perform some exploratory data on data frame train:
par(mfrow = c(2,3))
T1 <- table(df.sm$ClienteStorico, df.sm$Default,
dnn = c('Cliente storico', 'Default'))
mosaicplot(T1,
main = 'Cliente storico per default')
T2 <- table(df.sm$Settore, df.sm$Default,
dnn = c('Settore', 'Default'))
mosaicplot(T2,
main = 'Settore per default')
T3 <- table(df.sm$Outlook, df.sm$Default,
5
![Page 6: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/6.jpg)
dnn = c('Outlook', 'Default'))
mosaicplot(T3,
main = 'Outlook per default')
boxplot(EBITOF ~ Default, data = df.sm,
main = 'EBITOF per default')
boxplot(DMP ~ Default, data = df.sm,
main = 'DMP per default')
boxplot(SaleChg ~ Default, data = df.sm,
main = 'Sales chg. per default')
Cliente storico per default
Cliente storico
Def
ault
No Si
No
Si
Settore per default
Settore
Def
ault
Dettaglio Edile Meccanica automotive Meccanica industriale Servizi
No
Si
Outlook per default
Outlook
Def
ault
Negative Positive Stable
No
Si
No Si
−1
01
23
4
EBITOF per default
No Si
01
23
45
DMP per default
No Si
−20
−10
010
20
Sales chg. per default
# barplot(table(df.sm£ClienteStorico, df.sm£Default)/nrow(df.sm),
# names.arg = c('Default No', 'Default Si'),
# main = 'Cliente storico per default')
par(mfrow = c(1,1))
s <- chisq.test(T1)
print(s)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: T1
## X-squared = 3.8975, df = 1, p-value = 0.04836
Cor plot for:
6
![Page 7: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/7.jpg)
pairs(df.sm[, 2:4])
DMP
−1 0 1 2 3 4
01
23
45
−1
01
23
4
EBITOF
0 1 2 3 4 5 −20 0 10
−20
010
SaleChg
pl.sm <- ggplot(df.sm, aes(DMP, EBITOF,
color = Default,
shape = Default)) +
geom_point(size = 3, alpha = 1.0) +
scale_shape(solid = FALSE) +
scale_color_manual(values = c(pal[2], pal[1])) +
scale_shape_manual(values = c(1,3))
pl.sm + stat_smooth(aes(group = 1), method = 'lm')
7
![Page 8: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/8.jpg)
−1
0
1
2
3
4
0 1 2 3 4 5DMP
EB
ITO
F Default
No
Si
require(MASS)
pr.dmp.fit <- lda(Default ~ DMP, data = df.sm)
pr.ebitof.fit <- lda(Default ~ EBITOF, data = df.sm)
#function to define prediction rule on LDA
dec.rule.ebit <- function(lda, df){A <- A <- mean(lda$means)
B <- log(lda$prior[2]) - log(lda$prior[1])
s2.k <- t(tapply(df$EBITOF, df$Default, var)) %*% lda$prior
C <- s2.k/(lda$means[1] - lda$means[2])
dr <- A + B * C
dr
}
dec.rule.dmp <- function(lda, df){A <- A <- mean(lda$means)
B <- log(lda$prior[2]) - log(lda$prior[1])
s2.k <- t(tapply(df$DMP, df$Default, var)) %*% lda$prior
C <- s2.k/(lda$means[1] - lda$means[2])
dr <- A + B * C
dr
}
dr.dmp <- dec.rule.dmp(pr.dmp.fit, df.sm)
dr.ebitof <- dec.rule.ebit(pr.ebitof.fit, df.sm)
pl.1.sm <- ggplot(df.sm, aes(DMP, fill = Default)) +
scale_fill_manual(values = c(pal[2], pal[1]))
pl.2.sm <- ggplot(df.sm, aes(EBITOF, fill = Default)) +
scale_fill_manual(values = c(pal[2], pal[1]))
8
![Page 9: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/9.jpg)
pl.1.sm + geom_histogram(data = subset(df.sm, Default = "Si"),
binwidth = 0.25,
alpha = 0.5,
position = 'identity') +
geom_histogram(data = subset(df.sm, Default = "No"),
binwidth = 0.25,
alpha = 0.5,
position = 'identity') +
geom_vline(xintercept = dr.dmp,
linetype = 'dashed')
pl.2.sm + geom_histogram(data = subset(df.sm, Default = "Si"),
binwidth=0.25, alpha = 0.5, position = 'identity') +
geom_histogram(data = subset(df.sm, Default = "No"),
binwidth=0.25, alpha = 0.5, position = 'identity') +
geom_vline(xintercept = dr.ebitof,
linetype = 'dashed')
0
5
10
15
20
0 2 4DMP
coun
t Default
No
Si
0
5
10
15
20
0 2 4EBITOF
coun
t Default
No
Si
1 Statistical & naıve predictors
Run logistic regression to show false predictors aren’t significant:
lgt.null <- glm(Default ~ .,
data = df.sm,
family = 'binomial')
summary(lgt.null)
##
## Call:
## glm(formula = Default ~ ., family = "binomial", data = df.sm)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.05817 -0.13653 -0.02901 -0.00372 2.02415
9
![Page 10: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/10.jpg)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.89273 2.12989 -3.236 0.00121 **
## DMP 3.27608 0.73848 4.436 9.15e-06 ***
## EBITOF -2.96387 0.74027 -4.004 6.23e-05 ***
## SaleChg -0.08523 0.05387 -1.582 0.11360
## ClienteStoricoSi 1.06459 0.81142 1.312 0.18951
## SettoreEdile 0.43600 1.18740 0.367 0.71348
## SettoreMeccanica automotive -1.00954 1.43137 -0.705 0.48062
## SettoreMeccanica industriale 1.22437 1.18360 1.034 0.30093
## SettoreServizi -0.82071 1.30597 -0.628 0.52972
## OutlookPositive 2.26890 1.08911 2.083 0.03723 *
## OutlookStable 1.21965 1.16211 1.050 0.29394
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 185.491 on 199 degrees of freedom
## Residual deviance: 49.379 on 189 degrees of freedom
## AIC: 71.379
##
## Number of Fisher Scoring iterations: 8
plot(data = df.sm, EBITOF ~ DMP,
main = 'Sample train + test',
cex = 1.5)
plot(data = df.sm, EBITOF ~ DMP,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
main = 'Sample train + test (defaulters displayed)',
cex = 1.5)
0 1 2 3 4 5
−1
01
23
4
Sample train + test
DMP
EB
ITO
F
0 1 2 3 4 5
−1
01
23
4
Sample train + test (defaulters displayed)
DMP
EB
ITO
F
10
![Page 11: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/11.jpg)
par(mfrow = c(1,2))
plot(data = train, EBITOF ~ DMP,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
main = 'Train',
cex = 1.5)
plot(data = test, EBITOF ~ DMP,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
main = 'Test',
cex = 1.5)
0 1 2 3 4
01
23
4
Train
DMP
EB
ITO
F
0 1 2 3 4 5
−1
01
23
Test
DMP
EB
ITO
F
par(mfrow=c(1,1))
plot(data = train, EBITOF ~ DMP,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
main = 'Train sample data set',
cex = 1.5)
11
![Page 12: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/12.jpg)
0 1 2 3 4
01
23
4
Train sample data set
DMP
EB
ITO
F
Display train data sample:
plot(data = train, EBITOF ~ DMP,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
cex = 1.5)
par(mfrow = c (1,2))
boxplot(data = train, EBITOF ~ Default,
col = c(pal[2], pal[1]), ylab='EBIT/OF',
xlab = 'Default')
boxplot(data = train, DMP ~ Default,
col = c(pal[2], pal[1]), ylab='D/MP',
xlab = 'Default')
par(mfrow = c(1,1))
12
![Page 13: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/13.jpg)
0 1 2 3 4
01
23
4
DMP
EB
ITO
F
No Si
01
23
4
Default
EB
IT/O
F
No Si
01
23
4
Default
D/M
P
Fit logit model to train data sample, check R has coded the response Defaultcorrectly, contrasts(train$Default) shows it has been created a dummyvariable with 1 being the default status.
lgt.fit <- glm(Default ~ DMP + EBITOF,
data = train,
family = 'binomial')
contrasts(train$Default)
## Si
## No 0
## Si 1
summary(lgt.fit)
##
## Call:
## glm(formula = Default ~ DMP + EBITOF, family = "binomial", data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.30735 -0.22448 -0.06741 -0.02830 2.78986
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.8695 2.3823 -2.464 0.013746 *
## DMP 3.0051 0.8837 3.400 0.000673 ***
## EBITOF -1.8921 0.8158 -2.319 0.020383 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 77.277 on 99 degrees of freedom
## Residual deviance: 25.021 on 97 degrees of freedom
13
![Page 14: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/14.jpg)
## AIC: 31.021
##
## Number of Fisher Scoring iterations: 7
Perform prediction on test data sample:
lgt.probs <- predict(lgt.fit, newdata = test, type = 'response')
Prepare canvas grid to highlight accept/reject areas on chart based on df.sm dataframe length of x and y axes. expand.grid() generates a set of x-y coordinatesto grid the canvas.
xlim <- range(df.sm$DMP)
xlim
## [1] 0.000000 4.815215
ylim <- range(df.sm$EBITOF)
ylim
## [1] -0.8916338 3.9687379
x <- seq(xlim[1], xlim[2], length = s.s/4)
y <- seq(ylim[1], ylim[2], length = s.s/4)
grid <- expand.grid(x = x,y = y)
names(grid) <- c('DMP', 'EBITOF')
g <- predict(lgt.fit,newdata = grid, type = 'response')
head(g)
## 1 2 3 4 5 6
## 0.01503178 0.02009202 0.02680935 0.03569068 0.04737098 0.06262557
plot(EBITOF ~ DMP, data = test,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
main = 'Campione di test (LGT)',
cex = 1.5,
xlim = xlim,
ylim = ylim)
z <- outer(x, y, function(x,y)predict(lgt.fit,
newdata = data.frame(DMP = x,
EBITOF = y),
type = 'response'))
contour(x, y, z, add = TRUE, level = 0.20, lwd = 1)
points(grid, pch = '.', lwd = 1.25,
col=ifelse(g>=0.2, pal[1],pal[2]))
14
![Page 15: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/15.jpg)
0 1 2 3 4 5
−1
01
23
4
Campione di test (LGT)
DMP
EB
ITO
F
0.2
Confusion matrix to present prediction results:
lgt.pred = rep("No", s.s/2)
lgt.pred[lgt.probs >= 0.20] <- "Si"
tab <- table(lgt.pred, test$Default,
dnn = c('Class. prevista',
'Class. effettiva'))
addmargins(tab)
## Class. effettiva
## Class. prevista No Si Sum
## No 72 1 73
## Si 6 21 27
## Sum 78 22 100
#error rate er
er <- mean(lgt.pred != test$Default); names(er) <- 'Error rate'
# sensitivity
sen <- tab[2,2]/(tab[1,2]+tab[2,2])
names(sen) <- 'Sensitivity'
# specificity
sp <- tab[1,1]/(tab[1,1]+tab[2,1]); names(sp) <- 'Specificity'
er; sen; sp
## Error rate
## 0.07
## Sensitivity
## 0.9545455
## Specificity
## 0.9230769
15
![Page 16: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/16.jpg)
Load required package MASS for LDA analysis and predict response on testsample:
require(MASS)
lda.fit <- lda(data = train, Default ~ DMP + EBITOF)
lda.probs <- predict(lda.fit,
newdata = test, type = 'response')
g.lda <- predict(lda.fit, newdata = grid)
plot(EBITOF ~ DMP, data = test,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
main = 'Campione di test (LDA)',
cex = 1.5,
xlim = xlim,
ylim = ylim)
z <- outer(x, y, function(x,y)predict(lda.fit,
newdata = data.frame(DMP = x,
EBITOF = y))$posterior[,2])
contour(x, y, z, add = TRUE, level = 0.20, lwd = 1)
points(grid, pch = ".", lwd = 1.25,
col=ifelse(g.lda$posterior[,2]>=0.2, pal[1],pal[2]))
0 1 2 3 4 5
−1
01
23
4
Campione di test (LDA)
DMP
EB
ITO
F
0.2
Confusion matrix with LDA results and test error rate:
lda.pred <- rep('No', s.s/2)
lda.pred[lda.probs$posterior[,2]>=0.2] <- 'Si'
lda.tab <- table(lda.pred, test$Default,
dnn = c('Class. prevista',
16
![Page 17: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/17.jpg)
'Class. effettica'))
addmargins(lda.tab)
## Class. effettica
## Class. prevista No Si Sum
## No 72 1 73
## Si 6 21 27
## Sum 78 22 100
lda.er <- mean(lda.pred != test$Default)
lda.er
## [1] 0.07
qda.fit <- qda(data = train, Default ~ DMP + EBITOF)
qda.probs <- predict(qda.fit,
newdata = test, type = "response")
g.qda <- predict(qda.fit,newdata = grid)
Confusion matrix with QDA results:
qda.pred <- rep('No', s.s/2)
qda.pred[qda.probs$posterior[,2] >= 0.2] <- 'Si'
qda.tab <- table(qda.pred, test$Default,
dnn = c('Class. prevista',
'Class. effettica'))
addmargins(qda.tab)
## Class. effettica
## Class. prevista No Si Sum
## No 72 2 74
## Si 6 20 26
## Sum 78 22 100
qda.er <- mean(qda.pred != test$Default)
qda.er
## [1] 0.08
plot(EBITOF ~ DMP, data = test,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
main = 'Campione di test (QDA)',
cex = 1.5,
xlim = xlim,
ylim = ylim)
z <- outer(x, y, function(x,y)predict(qda.fit,
newdata = data.frame(DMP = x,
EBITOF = y))$posterior[,2])
contour(x, y, z, add = TRUE,
level = 0.20, lwd = 1)
17
![Page 18: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/18.jpg)
points(grid,
col = ifelse(g.qda$posterior[,2]>=0.2,
pal[1], pal[2]),
pch= '.', cex = 0.5)
0 1 2 3 4 5
−1
01
23
4
Campione di test (QDA)
DMP
EB
ITO
F
0.2
library(class)
train.X <- cbind(train$DMP, train$EBITOF)
test.X <- cbind(test$DMP, test$EBITOF)
train.Default <- train$Default
set.seed(1)
kk = 15
knn.pred <- knn(train = train.X,
test = test.X,
cl = train.Default,
k = kk,
prob = TRUE)
summary(knn.pred)
## No Si
## 87 13
Confusion matrix and test error rate for KNN classifier, based on 0.2 probabilities(as opposed to 0.50 probabilities by default):
knn.pred.prob <- attr(knn.pred, 'prob')
knn.probs <- ifelse(knn.pred == 'No',
1 - knn.pred.prob,knn.pred.prob)
knn.pred.cl <- rep("No", s.s/2)
knn.pred.cl[knn.probs >= 0.2] <- "Si"
18
![Page 19: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/19.jpg)
knn.tab <- table(knn.pred.cl,
test$Default,
dnn = c('Class. prevista',
'Class. effettiva'))
addmargins(knn.tab)
## Class. effettiva
## Class. prevista No Si Sum
## No 74 2 76
## Si 4 20 24
## Sum 78 22 100
knn.er <- mean(knn.pred.cl != test$Default)
knn.er
## [1] 0.06
Accesso knn estimated probabilities via the attr function and transform prob-abilities of default accordingly. Prepare grid matrix for plotting decision area(matrix(knn.probs, ...)).
knn.probs <- attr(knn.pred, "prob")
head(knn.probs)
## [1] 1.0000000 0.8000000 1.0000000 0.6000000 0.5333333 1.0000000
knn.probs <- ifelse(knn.pred == 'No',
1 - knn.probs,knn.probs)
head(knn.probs)
## [1] 0.0000000 0.8000000 0.0000000 0.4000000 0.4666667 0.0000000
knn.probs.kk <- matrix(knn.probs,
length(x), length(y))
z.knn <- knn(train = train.X,
test = grid,
cl = train.Default,
k = kk, prob = TRUE)
z.knn.probs <- attr(z.knn, "prob")
z.knn.probs <- ifelse(z.knn == 'No',
1 - z.knn.probs,
z.knn.probs)
z.knn.probs.kk <- matrix(z.knn.probs,
length(x),
length(y))
g.knn <- knn(train.X,
grid,
train.Default,
k = kk,
prob = TRUE)
g.knn.probs <- attr(g.knn, "prob")
g.knn.probs <- ifelse(g.knn == 'No',
19
![Page 20: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/20.jpg)
1 - g.knn.probs,
g.knn.probs)
g.knn.probs.kk <- matrix(g.knn.probs,
length(x),
length(y))
Chart KNN decision boundary and points:
# chart KNN
plot(grid, col = ifelse(g.knn.probs.kk>=0.2,
pal[1], pal[2]),
cex = 0.25,
pch = ".",
main = 'KNN = 15',
xlim = xlim,
ylim = ylim)
points(EBITOF ~ DMP, data = test,
col = ifelse(Default == 'No',
pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
cex = 1.5)
z.knn <- knn(train = train.X,
test = grid,
cl = train.Default,
k = kk, prob = TRUE)
z.knn.probs <- attr(z.knn, "prob")
z.knn.probs <- ifelse(z.knn == 'No',
1 - z.knn.probs,
z.knn.probs)
z.knn.probs.kk <- matrix(z.knn.probs,
length(x),
length(y))
contour(x, y, z.knn.probs.kk,
levels = 0.20,
add = TRUE)
20
![Page 21: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/21.jpg)
0 1 2 3 4 5
−1
01
23
4
KNN = 15
DMP
EB
ITO
F
Plot four charts on train data set aligned:
par(mfrow=c(2,2))
plot(grid, pch = ".", lwd = 0.25,
col=ifelse(g>=0.2, pal[1],pal[2]),
cex = 0.25, main = 'LGT',
xlim = xlim,
ylim = ylim)
points(EBITOF ~ DMP, data = train,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5)
z <- outer(x, y,
function(x,y)predict(lgt.fit,
newdata = data.frame(DMP = x,
EBITOF = y),
type = 'response'))
contour(x, y, z, add = TRUE, level = 0.20, lwd = 1)
plot(grid, pch = ".", lwd = 0.25,
col=ifelse(g.lda$posterior[,2]>=0.2,
pal[1],pal[2]),
main = 'LDA',
cex = 0.25,
xlim = xlim,
ylim = ylim)
points(EBITOF ~ DMP, data = train,
col = ifelse(Default == 'No',
pal[2], pal[1]),
21
![Page 22: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/22.jpg)
pch = ifelse(Default == 'No', 1, 3),
cex = 1.5)
z <- outer(x, y,
function(x,y)predict(lda.fit,
newdata = data.frame(DMP = x,
EBITOF = y))$posterior[,2])
contour(x, y, z, add = TRUE, level = 0.20, lwd = 1)
plot(grid,
col = ifelse(g.qda$posterior[,2]>=0.2, pal[1], pal[2]),
pch= '.', cex = 0.25,
main = 'QDA',
xlim = xlim,
ylim = ylim)
points(EBITOF ~ DMP,
data = train,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5)
z <- outer(x, y,
function(x,y)predict(qda.fit,
newdata = data.frame(DMP = x,
EBITOF = y))$posterior[,2])
contour(x, y, z,
add = TRUE,
level = 0.20, lwd = 1)
plot(grid,
col = ifelse(g.knn.probs.kk>=0.2, pal[1], pal[2]),
cex = 0.25, pch = ".", main = 'KNN = 15',
xlim = xlim,
ylim = ylim)
points(EBITOF ~ DMP,
data = train,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5)
z.knn <- knn(train = train.X,
test = grid,
cl = train.Default,
k = kk,
prob = TRUE)
z.knn.probs <- attr(z.knn, "prob")
z.knn.probs <- ifelse(z.knn == 'No',
1 - z.knn.probs,
z.knn.probs)
z.knn.probs.kk <- matrix(z.knn.probs,
length(x),
22
![Page 23: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/23.jpg)
length(y))
contour(x, y, z.knn.probs.kk,
levels = 0.20,
add = TRUE)
0 1 2 3 4 5
−1
01
23
4
LGT
DMP
EB
ITO
F
0.2
0 1 2 3 4 5
−1
01
23
4
LDA
DMP
EB
ITO
F
0.2
0 1 2 3 4 5
−1
01
23
4
QDA
DMP
EB
ITO
F
0.2
0 1 2 3 4 5
−1
01
23
4
KNN = 15
DMP
EB
ITO
F
Plot four charts for decision rules on test data set aligned:
par(mfrow=c(2,2))
plot(grid, pch = ".", lwd = 0.25,
col=ifelse(g>=0.2, pal[1],pal[2]),
cex = 0.25, main = 'LGT',
xlim = xlim,
ylim = ylim)
points(EBITOF ~ DMP, data = test,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5)
z <- outer(x, y,
function(x,y)predict(lgt.fit,
23
![Page 24: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/24.jpg)
newdata = data.frame(DMP = x,
EBITOF = y),
type = 'response'))
contour(x, y, z, add = TRUE, level = 0.20, lwd = 1)
plot(grid, pch = ".", lwd = 0.25,
col=ifelse(g.lda$posterior[,2]>=0.2,
pal[1],pal[2]),
main = 'LDA',
cex = 0.25,
xlim = xlim,
ylim = ylim)
points(EBITOF ~ DMP, data = test,
col = ifelse(Default == 'No',
pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
cex = 1.5)
z <- outer(x, y,
function(x,y)predict(lda.fit,
newdata = data.frame(DMP = x,
EBITOF = y))$posterior[,2])
contour(x, y, z, add = TRUE, level = 0.20, lwd = 1)
plot(grid,
col = ifelse(g.qda$posterior[,2]>=0.2, pal[1], pal[2]),
pch= '.', cex = 0.25,
main = 'QDA',
xlim = xlim,
ylim = ylim)
points(EBITOF ~ DMP,
data = test,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5)
z <- outer(x, y,
function(x,y)predict(qda.fit,
newdata = data.frame(DMP = x,
EBITOF = y))$posterior[,2])
contour(x, y, z,
add = TRUE,
level = 0.20, lwd = 1)
plot(grid,
col = ifelse(g.knn.probs.kk>=0.2, pal[1], pal[2]),
cex = 0.25, pch = ".", main = 'KNN = 15',
xlim = xlim,
ylim = ylim)
points(EBITOF ~ DMP,
24
![Page 25: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/25.jpg)
data = test,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5)
z.knn <- knn(train = train.X,
test = grid,
cl = train.Default,
k = kk,
prob = TRUE)
z.knn.probs <- attr(z.knn, "prob")
z.knn.probs <- ifelse(z.knn == 'No',
1 - z.knn.probs,
z.knn.probs)
z.knn.probs.kk <- matrix(z.knn.probs,
length(x),
length(y))
contour(x, y, z.knn.probs.kk,
levels = 0.20,
add = TRUE)
25
![Page 26: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/26.jpg)
0 1 2 3 4 5
−1
01
23
4
LGT
DMP
EB
ITO
F
0.2
0 1 2 3 4 5
−1
01
23
4
LDA
DMP
EB
ITO
F
0.2
0 1 2 3 4 5
−1
01
23
4
QDA
DMP
EB
ITO
F
0.2
0 1 2 3 4 5
−1
01
23
4
KNN = 15
DMP
EB
ITO
F
Plot naıve predictors based on financial ratios (EBITOF and DMP) on traindata sample:
par(mfrow=c(1,2))
plot(EBITOF ~ DMP, data = train,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5,
xlim = xlim,
ylim = ylim)
points(grid, pch = ".", lwd = 0.25,
col = ifelse(grid$EBITOF <= 1.2, pal[1], pal[2] ))
abline(h = 1.2, lty = 2, lwd = 1)
plot(EBITOF ~ DMP, data = train,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5,
xlim = xlim,
ylim = ylim)
points(grid, pch = ".", lwd = 0.25,
26
![Page 27: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/27.jpg)
col = ifelse(grid$DMP >= 2.0, pal[1], pal[2] ))
abline(v = 2.0, lty = 2, lwd = 1)
0 1 2 3 4 5
−1
01
23
4
DMP
EB
ITO
F
0 1 2 3 4 5
−1
01
23
4DMP
EB
ITO
F
par(mfrow=c(1,1))
Plot naıve predictors based on financial ratios (EBITOF and DMP):
par(mfrow=c(1,2))
plot(EBITOF ~ DMP, data = test,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5,
xlim = xlim,
ylim = ylim)
points(grid, pch = ".", lwd = 0.25,
col = ifelse(grid$EBITOF <= 1.2, pal[1], pal[2] ))
abline(h = 1.2, lty = 2, lwd = 1)
plot(EBITOF ~ DMP, data = test,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5,
xlim = xlim,
ylim = ylim)
points(grid, pch = ".", lwd = 0.25,
col = ifelse(grid$DMP >= 2.0, pal[1], pal[2] ))
abline(v = 2.0, lty = 2, lwd = 1)
27
![Page 28: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/28.jpg)
0 1 2 3 4 5
−1
01
23
4
DMP
EB
ITO
F
0 1 2 3 4 5
−1
01
23
4
DMP
EB
ITO
Fpar(mfrow=c(1,1))
Use naıve predictors compounding them in a logical AND decision rule (EBIT≤ 1.2 AND DMP ≥ 2.0):
plot(EBITOF ~ DMP, data = train,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5,
xlim = xlim,
ylim = ylim, main = 'Train data set')
points(grid, pch = ".", lwd = 0.25,
col = ifelse((grid$EBITOF <= 1.2) &
(grid$DMP >= 2.0),
pal[1], pal[2] ))
abline(h = 1.2, lty = 2, lwd = 2)
abline(v = 2.0, lty = 2, lwd = 2)
plot(EBITOF ~ DMP, data = test,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3), cex = 1.5,
xlim = xlim,
ylim = ylim, main = 'Test data set')
points(grid, pch = ".", lwd = 0.25,
col = ifelse((grid$EBITOF <= 1.2) &
(grid$DMP >= 2.0),
pal[1], pal[2] ))
abline(h = 1.2, lty = 2, lwd = 2)
abline(v = 2.0, lty = 2, lwd = 2)
28
![Page 29: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/29.jpg)
0 1 2 3 4 5
−1
01
23
4
Train data set
DMP
EB
ITO
F
0 1 2 3 4 5
−1
01
23
4
Test data set
DMP
EB
ITO
Fedmp.pred <- rep('No', s.s/2)
edmp.pred[(test$EBITOF <= 1.2) & (test$DMP >= 2.0)] <- 'Si'
edmp.tab <- table(edmp.pred,
test$Default,
dnn = c('Class. prevista',
'Class. effettiva'))
addmargins(edmp.tab)
## Class. effettiva
## Class. prevista No Si Sum
## No 77 4 81
## Si 1 18 19
## Sum 78 22 100
edmp.er <- mean(edmp.pred != test$Default)
edmp.er
## [1] 0.05
Prepare data frame for ROC curves:
require(pROC)
res <- data.frame(Default = test$Default,
LGT = lgt.probs,
LDA = lda.probs$posterior[,2],
QDA = qda.probs$posterior[,2],
KNN = knn.probs,
EBITOF = test$EBITOF,
DMP = test$DMP)
head(res)
## Default LGT LDA QDA KNN EBITOF
## 198 No 0.019904605 0.012715513 0.009984894 0.0000000 2.05021166
## 978 Si 0.997576429 0.999267410 0.999048875 0.8000000 0.02209348
## 740 No 0.006655441 0.003060701 0.002081429 0.0000000 3.31898715
## 974 Si 0.562134689 0.619997321 0.585406157 0.4000000 1.17433535
29
![Page 30: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/30.jpg)
## 14 No 0.766882711 0.818991764 0.813543896 0.4666667 1.56117928
## 258 No 0.002403967 0.001199350 0.001596928 0.0000000 1.63575939
## DMP
## 198 1.9473854
## 978 3.9704303
## 740 2.3772296
## 974 2.7757477
## 14 3.3324449
## 258 0.9771178
pal1 <- brewer.pal(7, "Dark2")
par(mfrow = c(1,2))
plot.roc(Default ~ LGT, data = res,
main = "Curve ROC machine learning su test",
print.auc = T, percent = T,
print.auc.x = 30,
print.auc.y = 70,
col=pal1[1],
grid = TRUE)
plot.roc(Default ~ LDA, data = res, add = TRUE,
print.auc = T, percent = T,
print.auc.x = 30,
print.auc.y = 60,
col=pal1[2])
plot.roc(Default ~ QDA, data = res, add = TRUE,
thresholds="best",
print.thres = "best",
print.auc = T, percent = T,
print.auc.x = 30,
print.auc.y = 50,
col=pal1[3])
plot.roc(Default ~ KNN, data = res, add = TRUE,
thresholds="best",
print.thres = "best",
print.auc = T, percent = T,
print.auc.x = 30,
print.auc.y = 40,
col=pal1[4])
legend("bottomright", legend=c("LGT", "LDA", "QDA", "KNN"),
col=c(pal1[1], pal1[2], pal1[3], pal1[4]),
lwd=2)
# secondo plot classificatori naive
plot.roc(Default ~ EBITOF, data = res,
thresholds="best",
30
![Page 31: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/31.jpg)
main = "Curve ROC class. naive su test",
print.thres = "best",
print.auc = T, percent = T,
print.auc.x = 30,
print.auc.y = 70,
col=pal1[5],
grid = TRUE)
plot.roc(Default ~ DMP, data = res, add = TRUE,
thresholds="best",
print.thres = "best",
print.auc = T, percent = T,
print.auc.x = 30,
print.auc.y = 60,
col=pal1[6])
legend("bottomright", legend=c("EBITOF", "DMP"),
col=c(pal1[5], pal1[6]),
lwd=2)
Curve ROC machine learning su test
Specificity (%)
Sen
sitiv
ity (
%)
020
4060
8010
0
100 80 60 40 20 0
AUC: 96.8%
AUC: 97.0%
0.2 (91.0%, 95.5%)
AUC: 97.0%
0.2 (94.9%, 90.9%)
AUC: 94.4%
LGTLDAQDAKNN
Curve ROC class. naive su test
Specificity (%)
Sen
sitiv
ity (
%)
020
4060
8010
0
100 80 60 40 20 0
1.2 (91.0%, 81.8%)
AUC: 91.7%
2.0 (82.1%, 100.0%)
AUC: 93.1%
EBITOFDMP
par(mfrow = c(1,1))
2 Cross validation
n.iter <- 100
lgt.er <- rep(0,n.iter)
lda.er <- rep(0,n.iter)
qda.er <- rep(0,n.iter)
knn.er <- rep(0,n.iter)
31
![Page 32: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/32.jpg)
for (i in 1:n.iter){set.seed(i)
z <- sort(sample(nrow(df.sm), nrow(df.sm) * 0.5))
train <- df.sm[z,]
test <- df.sm[-z,]
# regression logistica
lgt.fit <- glm(Default ~ DMP + EBITOF,
data = train,
family = 'binomial' )
lgt.probs <- predict(lgt.fit, newdata = test,
type = 'response')
lgt.pred <- rep("No", s.s/2)
lgt.pred[lgt.probs >= 0.20] <- "Si"
lgt.er[i] <- mean(lgt.pred != test$Default);
# LDA
lda.fit <- lda(data = train, Default ~ DMP + EBITOF)
lda.probs <- predict(lda.fit,
newdata = test, type = 'response')
lda.pred <- rep('No', s.s/2)
lda.pred[lda.probs$posterior[,2]>=0.2] <- 'Si'
lda.er[i] <- mean(lda.pred != test$Default)
# QDA
qda.fit <- qda(data = train, Default ~ DMP + EBITOF)
qda.probs <- predict(qda.fit,
newdata = test, type = "response")
qda.pred <- rep('No', s.s/2)
qda.pred[qda.probs$posterior[,2]>=0.2] <- 'Si'
qda.er[i] <- mean(qda.pred != test$Default)
# KNN
train.X <- cbind(train$DMP, train$EBITOF)
test.X <- cbind(test$DMP, test$EBITOF)
train.Default <- train$Default
kk = 15
knn.pred <- knn(train = train.X,
test = test.X,
cl = train.Default,
k = kk, prob = TRUE)
knn.pred.prob <- attr(knn.pred, 'prob')
knn.probs <- ifelse(knn.pred == 'No',
1 - knn.pred.prob,
knn.pred.prob)
knn.pred.cl <- rep('No', s.s/2)
knn.pred.cl[knn.probs >= 0.2] <- 'Si'
knn.er[i] <- mean(knn.pred.cl != test$Default)
}
32
![Page 33: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/33.jpg)
df.res <- data.frame(LGT = lgt.er,
LDA = lda.er,
QDA = qda.er,
KNN = knn.er)
head(df.res)
## LGT LDA QDA KNN
## 1 0.07 0.07 0.08 0.06
## 2 0.07 0.07 0.07 0.08
## 3 0.09 0.08 0.11 0.09
## 4 0.04 0.05 0.05 0.04
## 5 0.06 0.09 0.07 0.08
## 6 0.07 0.07 0.06 0.06
require(tidyr)
df.res.n <- gather(df.res, "Model", "Error", 1:4)
head(df.res.n)
## Model Error
## 1 LGT 0.07
## 2 LGT 0.07
## 3 LGT 0.09
## 4 LGT 0.04
## 5 LGT 0.06
## 6 LGT 0.07
require(RColorBrewer)
pal2 <- brewer.pal(5, 'Dark2')
boxplot(data = df.res.n,
Error ~ Model,
col = pal2,
main = "Test error rate su validation set (100 iteraz.)")
33
![Page 34: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/34.jpg)
LGT LDA QDA KNN
0.02
0.06
0.10
0.14
Test error rate su validation set (100 iteraz.)
3 Intuitive predictors
Again, considering Kahneman’s quoted paragraph at the beginning of this essay,I find approriate to quote now the sensible contribution of Prof. Tagliavini inBiffis et al. (2014, page 144)
There are solutions that are elegant and precise and solutions thatare rough and approximate: not necessarily the former are betterthan the latter.
Intuitive predictors needed by loan managers to diagnose
An intuitive predictor like:
DMP − EBITOF ≥ C (1)
In altre parole quando la differenza tra il livello di indebitamento (DMP) e ilmargine sugli oneri finanziari (EBITOF) e sale al di sopra di un certo livello (inFigura sotto −1 dal momento che in ordinata abbiamo EBITOF) allora entriamoin un’area di rischio eccessivo.
Bisogna usare il data frame res because train e test sono stati ri-seedatidurante la fase di validation set.
plot(EBITOF ~ DMP, data = res,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
34
![Page 35: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/35.jpg)
main = 'Previsore intuitivo (DMP - EBITOF)',
cex = 1.5,
xlim = xlim,
ylim = ylim)
int <- outer(x, y, function(x,y)y - x)
contour(x, y, int, add = TRUE, level = c(-2.0, -1.0, 0.0),
lwd = 2, lty = 2,
col = pal[4])
plot(EBITOF ~ DMP, data = res,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
main = 'LDA e previsore intuitivo (DMP - EBITOF)',
cex = 1.5,
xlim = xlim,
ylim = ylim)
z <- outer(x, y, function(x,y)predict(lda.fit,
newdata = data.frame(DMP = x,
EBITOF = y))$posterior[,2])
contour(x, y, z, add = TRUE, level = 0.20, lwd = 1)
int <- outer(x, y, function(x,y)y - x)
contour(x, y, int, add = TRUE, level = c(-1.0),
lwd = 2, lty = 2,
col = pal[4])
0 1 2 3 4 5
−1
01
23
4
Previsore intuitivo (DMP − EBITOF)
DMP
EB
ITO
F
−2
−1
0
0 1 2 3 4 5
−1
01
23
4
LDA e previsore intuitivo (DMP − EBITOF)
DMP
EB
ITO
F
0.2
−1
Thus we have:DMP − EBITOF ≥ 1 (2)
or equivalently on the x-y chart where EBITOF is on the y axis:
EBITOF ≤ DMP − 1 (3)
The classifier in (3) gives the following outcome:
35
![Page 36: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/36.jpg)
int <- res$EBITOF - res$DMP
int.pred <- rep('No', s.s/2)
int.pred[int <= -1.0] <- 'Si'
int.tab <- table(int.pred, res$Default,
dnn = c('Class. prevista', 'Class. effettiva'))
addmargins(int.tab)
## Class. effettiva
## Class. prevista No Si Sum
## No 73 2 75
## Si 5 20 25
## Sum 78 22 100
# error rate
int.er <- mean(int.pred != res$Default)
names(int.er) <- 'Error rate'
# sensitivity
int.sen <- int.tab[2,2]/(int.tab[1,2]+int.tab[2,2])
names(int.sen) <- 'Sensitivity'
# specificity
int.sp <- int.tab[1,1]/(int.tab[1,1]+int.tab[2,1])
names(int.sp) <- 'Specificity'
int.er; int.sen; int.sp
## Error rate
## 0.07
## Sensitivity
## 0.9090909
## Specificity
## 0.9358974
res$INTUIT <- res$DMP - res$EBITOF
plot.roc(Default ~ LDA, data = res,
grid = TRUE,
main = "Curve ROC class. LDA e intuitivo",
print.auc = T, percent = T,
print.auc.x = 30,
print.auc.y = 70,
col=pal2[2])
plot.roc(Default ~ INTUIT, data = res, add = TRUE,
thresholds=c(1.0),
print.thres = c(1.0),
print.auc = T, percent = T,
print.auc.x = 30,
print.auc.y = 60,
col=pal2[5])
legend("bottomright", legend=c("LDA", "DMP - EBITOF"),
col=c(pal2[2], pal2[5]),
lwd=2)
36
![Page 37: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/37.jpg)
Curve ROC class. LDA e intuitivo
Specificity (%)
Sen
sitiv
ity (
%)
020
4060
8010
0
100 80 60 40 20 0
AUC: 97.0%
1.0 (93.6%, 90.9%)
AUC: 97.4%
LDADMP − EBITOF
Let’s test the intuitive predictor in a validation set:
n.iter <- 100
lgt.er <- rep(0,n.iter)
lda.er <- rep(0,n.iter)
qda.er <- rep(0,n.iter)
knn.er <- rep(0,n.iter)
int.er <- rep(0,n.iter)
eBit.er <- rep(0,n.iter)
dMp.er <- rep(0,n.iter)
for (i in 1:n.iter){set.seed(i)
z <- sort(sample(nrow(df.sm), nrow(df.sm) * 0.5))
train <- df.sm[z,]
test <- df.sm[-z,]
# regression logistica
lgt.fit <- glm(Default ~ DMP + EBITOF,
data = train,
family = 'binomial' )
lgt.probs <- predict(lgt.fit,
newdata = test,
type = 'response')
lgt.pred <- rep("No", s.s/2)
lgt.pred[lgt.probs >= 0.20] <- "Si"
lgt.er[i] <- mean(lgt.pred != test$Default);
# LDA
lda.fit <- lda(data = train, Default ~ DMP + EBITOF)
37
![Page 38: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/38.jpg)
lda.probs <- predict(lda.fit,
newdata = test,
type = 'response')
lda.pred <- rep('No', s.s/2)
lda.pred[lda.probs$posterior[,2]>=0.2] <- 'Si'
lda.er[i] <- mean(lda.pred != test$Default)
# QDA
qda.fit <- qda(data = train, Default ~ DMP + EBITOF)
qda.probs <- predict(qda.fit,
newdata = test,
type = "response")
qda.pred <- rep('No', s.s/2)
qda.pred[qda.probs$posterior[,2]>=0.2] <- 'Si'
qda.er[i] <- mean(qda.pred != test$Default)
# KNN
train.X <- cbind(train$DMP, train$EBITOF)
test.X <- cbind(test$DMP, test$EBITOF)
train.Default <- train$Default
kk = 15
knn.pred <- knn(train = train.X,
test = test.X,
cl = train.Default,
k = kk, prob = TRUE)
knn.pred.prob <- attr(knn.pred, 'prob')
knn.probs <- ifelse(knn.pred == 'No',
1 - knn.pred.prob,
knn.pred.prob)
knn.pred.cl <- rep('No', s.s/2)
knn.pred.cl[knn.probs >= 0.2] <- 'Si'
knn.er[i] <- mean(knn.pred.cl != test$Default)
#INT
int <- test$EBITOF - test$DMP
int.pred <- rep('No', s.s/2)
int.pred[int <= -1.0] <- 'Si'
int.er[i] <- mean(int.pred != test$Default)
#EBITOF
eBit <- test$EBITOF
eBit.pred <- rep('No', s.s/2)
eBit.pred[eBit <= 1.2] <- 'Si'
eBit.er[i] <- mean(eBit.pred != test$Default)
#DMP
dMp <- test$DMP
dMp.pred <- rep('No', s.s/2)
38
![Page 39: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/39.jpg)
dMp.pred[dMp >= 2.0] <- 'Si'
dMp.er[i] <- mean(dMp.pred != test$Default)
}
Plot CV test error rates of predictors:
df.res <- data.frame(LGT = lgt.er,
LDA = lda.er,
QDA = qda.er,
KNN = knn.er,
INT = int.er,
EOF = eBit.er,
DMP = dMp.er)
head(df.res)
## LGT LDA QDA KNN INT EOF DMP
## 1 0.07 0.07 0.08 0.06 0.07 0.11 0.16
## 2 0.07 0.07 0.07 0.08 0.06 0.17 0.25
## 3 0.09 0.08 0.11 0.09 0.08 0.16 0.24
## 4 0.04 0.05 0.05 0.04 0.05 0.16 0.19
## 5 0.06 0.09 0.07 0.08 0.04 0.12 0.14
## 6 0.07 0.07 0.06 0.06 0.04 0.10 0.20
require(tidyr)
df.res.n <- gather(df.res, "Model", "Error", 1:7)
head(df.res.n)
## Model Error
## 1 LGT 0.07
## 2 LGT 0.07
## 3 LGT 0.09
## 4 LGT 0.04
## 5 LGT 0.06
## 6 LGT 0.07
# compute index of ordered 'cost factor' and reassign
#oind <- order(as.numeric(by(DF£cost, DF£type, median)))
oind <- order(as.numeric(by(df.res.n$Error,
df.res.n$Model,
median)))
# DF£type <- ordered(DF£type, levels=levels(DF£type)[oind])
#
df.res.n$Model <- ordered(df.res.n$Model,
levels = levels(df.res.n$Model)[oind])
# boxplot(cost ~ type, data=DF)
require(RColorBrewer)
pal2 <- brewer.pal(7, 'Dark2')
boxplot(data = df.res.n,
Error ~ Model,
col = pal2,
39
![Page 40: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/40.jpg)
main = "Test error rate CV (valid. set 100 iteraz.)")
INT LGT LDA QDA KNN EOF DMP
0.05
0.10
0.15
0.20
0.25
Test error rate CV (valid. set 100 iteraz.)
Plot intuitive predictor on train data sample:
plot(EBITOF ~ DMP, data = trainn,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
main = '(DMP - EBITOF = 1) | train sample',
cex = 1.5,
xlim = xlim,
ylim = ylim)
points(grid, pch = ".", lwd = 0.25,
col=ifelse(grid$DMP - grid$EBITOF >= 1, pal[1],pal[2]))
int <- outer(x, y, function(x,y)y - x)
contour(x, y, int, add = TRUE, level = c(-1.0), lwd = 1.5, lty = 2)
40
![Page 41: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/41.jpg)
0 1 2 3 4 5
−1
01
23
4
(DMP − EBITOF = 1) | train sample
DMP
EB
ITO
F
−1
Plot intuitive predictor on test data sample:
plot(EBITOF ~ DMP, data = testt,
col = ifelse(Default == 'No', pal[2], pal[1]),
pch = ifelse(Default == 'No', 1, 3),
main = '(DMP - EBITOF = 1) | test sample',
cex = 1.5,
xlim = xlim,
ylim = ylim)
points(grid, pch = ".", lwd = 0.25,
col=ifelse(grid$DMP - grid$EBITOF >= 1, pal[1],pal[2]))
int <- outer(x, y, function(x,y)y - x)
contour(x, y, int, add = TRUE, level = c(-1.0), lwd = 1.5, lty = 2)
41
![Page 42: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/42.jpg)
0 1 2 3 4 5
−1
01
23
4
(DMP − EBITOF = 1) | test sample
DMP
EB
ITO
F
−1
4 Rating class clustering
Save firm’s id number, now in row names, in a separate column: it’ll be neededlater as row names will be dropped by applying a function.
df.sm$Firm <- row.names(df.sm)
df.sm.dat <- df.sm[ , 2:3]
# set seed and sample a smaller set too
set.seed(123)
df.sm1 <- sample_n(df.sm, 20,
replace = FALSE)
df.sm1.dat <- df.sm1[, 2:3]
#first clustering
hc.comp <- hclust(dist(df.sm.dat),
method = 'complete')
cl <- cutree(hc.comp, k = 4)
df.sm$Cluster <- as.factor(cl)
#second plot on sample
hc.sm <- hclust(dist(df.sm1.dat),
method = 'complete')
42
![Page 43: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/43.jpg)
cl.sm <- cutree(hc.sm, k = 4)
df.sm1$Cluster <- as.factor(cl.sm)
# contingeny table di numero di default/non default per cluster
tab.rat <- table(df.sm$Default, df.sm$Cluster)
# numero di default per cluster
tab.df.rat <- table(df.sm$Cluster); tab.df.rat
##
## 1 2 3 4
## 98 23 68 11
tab.df.fr <- prop.table(table(df.sm$Default, df.sm$Cluster),
margin = 2)
tab.df.fr[2,]
## 1 2 3 4
## 0.08163265 0.86956522 0.00000000 0.63636364
# contingeny table di numero di default/non default per cluster
tab.rat1 <- table(df.sm1$Default, df.sm1$Cluster)
# numero di default per cluster
tab.df1.rat <- table(df.sm1$Cluster); tab.df1.rat
##
## 1 2 3 4
## 4 10 3 3
tab.df1.fr <- prop.table(table(df.sm1$Default, df.sm1$Cluster),
margin = 2)
tab.df1.fr[2,]
## 1 2 3 4
## 0.25 0.10 0.00 1.00
Compute probabilities of default on mid and small samples and write table todrive for handout for students.
require(dplyr)
df.sm <- df.sm %>%
group_by(Cluster) %>%
mutate(PD = prop.table(table(Default))[2]
)
head(df.sm$PD)
## [1] 0.08163265 0.86956522 0.08163265 0.08163265 0.00000000 0.86956522
df.sm <- df.sm[order(df.sm$PD), ]
df.sm$PD <- as.factor(round(df.sm$PD, digits = 2))
df.sm1 <- df.sm1 %>%
group_by(Cluster) %>%
mutate(PD = prop.table(table(Default))[2]
43
![Page 44: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/44.jpg)
)
head(df.sm1$PD)
## [1] 0.25 0.10 0.00 0.10 0.10 0.10
df.sm1 <- df.sm1[order(df.sm1$PD), ]
df.sm1$PD <- as.factor(round(df.sm1$PD, digits = 2))
write.table(df.sm1, file = "train_rat_20.csv",
sep = ";",
dec = ",")
Chart rating classes on scatterplot:
xx <- range(df.sm$DMP); yy <- range(df.sm$EBITOF)
pl.rat <- ggplot(df.sm, aes(DMP, EBITOF,
color = PD,
shape = Default)) +
geom_point(size = 3, alpha = 0.5) +
scale_color_manual(values = c(pal[2],
pal[3],
pal[4],
pal[1]
)) +
scale_shape_manual(values = c(1,3)) +
xlim(xx) + ylim(yy)
pl.rat +
geom_abline(intercept = -1, slope = 1, linetype = 'dashed')
pl.rat1 <- ggplot(df.sm1, aes(DMP, EBITOF,
color = PD,
shape = Default,
label=Firm)) +
geom_point(size = 3, alpha = 1.0) +
scale_color_manual(values = c(pal[2],
pal[3],
pal[4],
pal[1]
)) +
scale_shape_manual(values = c(1,3)) +
xlim(xx) + ylim(yy) + geom_text(hjust=1.25, vjust=1.25, size = I(3))
pl.rat1 +
geom_abline(intercept = -1, slope = 1, linetype = 'dashed')
44
![Page 45: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/45.jpg)
−1
0
1
2
3
4
0 1 2 3 4 5DMP
EB
ITO
F
PD
0
0.08
0.64
0.87
Default
No
Si
714274
730
130765
350
258
973
190
244
637
151
656
85
966
257
14
886
997951
−1
0
1
2
3
4
0 1 2 3 4 5DMP
EB
ITO
F
PD
a
a
a
a
0
0.1
0.25
1
Default
No
Si
Plot plain dendogram of hierarchical clustering with no aesthetical attributes, atlast.
plot(hc.comp, cex = 0.2,
xlab = "",
sub = "",
main = "")
rect.hclust(hc.comp, k = 4,
border = pal[1])
# second plot
plot(hc.sm, cex = 1.0,
xlab = '',
sub = '',
main = ''
)
rect.hclust(hc.sm, k = 4, border = pal[1])
822
856
880
888
878
860
245
876
866
826
703
905
891
974
510
978
852
958
997
951
939
968
962
261
924
366
966
202 14 94
383
085
198
591
374
077
052
979
6 155
344
196
216
610 78
049
9 666
509
217
257
288
762
175
618
98 545 57 460 70
117 468
527
274
501 48
971
4 590
458
151
795
138
764 76 234
137
416
579
708 40
161
965
799
614
225 79
327
120
952
859
722
0 8464 58
464
945
732
756
012
424
9 730
333
37 324
677
201
404
130
276
513
930
647
612
2 788
478
386
493
551
812
471
72 638
258 95 244
800
774
8145
918
461
248
751
291 51
830
366
797
538
569 72
019
341
263
7 9469
468
283
409
142
844
886 99
287
778
436
1 3885 52
253
951
594
267
028
925
065
173
570
756
341
184
275
618
038
090
776
949
659
9 121
311
198 74 362
113
706
300
36 161
973
608
350
42 656 14
119
046 9
114
676
235
206
01
23
45
Hei
ght
730
714
274
244
151
258
637
130
765
350
973
190
656
886
997
951 96
6 14 85 257
01
23
4
Hei
ght
The use of the excellent package dendextend by Galili (2015) is now required toadjust color, nodes and branches of rating class clusters on dendrogram.
require(dendextend)
require(colorspace)
45
![Page 46: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/46.jpg)
pl.rat1
hc <- as.dendrogram(hc.sm) %>%
rotate(c(4,1,2,3) )
hc %>% set("labels_cex", 1.0) %>%
set("labels_col", value = c(pal[1],
pal[4],
pal[3],
pal[2]),
k = 4) %>%
set("branches_k_color", value = c(pal[1],
pal[4],
pal[3],
pal[2]),
k = 4) %>%
plot(main = "")
hc %>% rect.dendrogram(k = 4,
border = 8,
lty = 5,
lwd = 1.0)
# abline(h = 2.25, lty = 2, lwd = 2.0)
714274
730
130765
350
258
973
190
244
637
151
656
85
966
257
14
886
997951
−1
0
1
2
3
4
0 1 2 3 4 5DMP
EB
ITO
F
PD
a
a
a
a
0
0.1
0.25
1
Default
No
Si
01
23
4
886
997
951
966 14 85 257
130
765
350
973
190
656
258
637
151
244
730
714
274
hc %>% set("labels_cex", 1.0) %>%
set("labels_col", value = c(pal[1],
pal[4],
pal[3],
pal[2]),
k = 4) %>%
set("branches_k_color", value = c(pal[1],
pal[4],
pal[3],
pal[2]),
k = 4) %>%
plot(main = "")
46
![Page 47: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/47.jpg)
hc %>% rect.dendrogram(k = 4,
border = 8,
lty = 5,
lwd = 1.0)
hc.all <- as.dendrogram(hc.comp)
hc.all %>%
set("labels_cex", 0.1) %>%
set("labels_col", value = c(pal[1],
pal[4],
pal[2],
pal[3]),
k = 4) %>%
set("branches_k_color", value = c(pal[1],
pal[4],
pal[2],
pal[3]),
k = 4) %>%
plot(main = "")
hc.all %>% rect.dendrogram(k = 4,
border = 8,
lty = 5,
lwd = 1.0)
01
23
4
886
997
951
966 14 85 257
130
765
350
973
190
656
258
637
151
244
730
714
274
01
23
45
822
856
880
888
878
860
245
876
866
826
703
905
891
974
510
978
852
958
997
951
939
968
962
261
924
366
966
202 14 943
830
851
985
913
740
770
529
796
155
344
196
216
610
780
499
666
509
217
257
288
762
175
618 98 545 57 460
701 17 468
527
274
501
489
714
590
458
151
795
138
764 76 234
137
416
579
708
401
619
657 99 614
225
793
271
209
528
597
220 84 64 584
649
457
327
560
124
249
730
333 37 324
677
201
404
130 2
765
139
306
476
122
788
478
386
493
551
812
471 72 638
258 95 244
800
774 81 459
184
612
487
512 91 518
303 66 797
538
569
720
193
412
637 94 694
682 83 409
142
844
886
992
877
784
361 38 85 522
539
515
942
670
289
250
651
735
707
563
411
842
756
180
380
907
769
496
599
121
311
198 74 362
113
706
300 36 161
973
608
350 42 656
141
190 46 9
114
676
235
206
47
![Page 48: Statistical, naïve and intuitive predictors in credit risk ... is required and the scatter plot options are adjusted so as to get a plot ... and testt just in case they are needed](https://reader034.vdocument.in/reader034/viewer/2022051509/5afcb0237f8b9a864d8c77e8/html5/thumbnails/48.jpg)
References
Biffis, Paolo, a cura di, (2014), con scritti di M. S. Avi, G. Tagliavini e F. Zen,Analisi del Merito di Credito, EIF.e-Book.
Galili, Tal (2015), dendextend: an R package for visualizing, adjusting, andcomparing trees of hierarchical clustering, The Journal of Bioinformatics.
Kahneman, Daniel (2011), Thinking, Fast and Slow, Penguin.
48