maxentropic and quantitative methods in operational risk modeling
TRANSCRIPT
MAXENTROPIC AND QUANTITATIVE METHODS INOPERATIONAL RISK MODELING
Erika [email protected]
joint work withSilvia Mayoral Henryk Gzyl
Department of Business AdministrationUniversidad Carlos III de Madrid
September, 2016
1 / 48
Outline
Work Review
(P1) Two Maxentropic Approaches to determine the probability density ofcompound losses. Journal of Insurance: Mathematics and Economics, 2015
(P2) Density Reconstructions with Errors in the Data. Entropy, 2014
(P3) Maxentropic approach to decompound aggregate risk losses. Journal of
Insurance: Mathematics and Economics, 2015
(P4) Loss data analysis: Analysis of the sample dependence in densityreconstruction by maxentropic methods. Journal of Insurance: Mathematics and Economics, 2016
(P5) Maximum entropy approach to the loss data aggregation problem. Journal of
Operational Risk, 2016.
2 / 48
Outline
1 Introduction
MotivationMethodology: Loss distribution Approach
Univariate caseMultivariate case
2 Maximum Entropy Approach
Examples and ApplicationsTheory
3 Numerical Results
4 Conclusions
3 / 48
IntroductionMotivation
Banks developed a conceptual framework to characterize and quantify risk, toput money aside to cover large-scale losses and to ensure the stability of thefinancial system.
In this sense, a similar problem appears also in Insurance, to set premiums andoptimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. InOperational risk the size of the historical data is small. So, the results may varywidely.
More precisely, we are interested in the calculation of regulatory/economiccapital using advanced models (LDA: loss distribution approach) allowed byBasel II.
The problem is the calculation of the amount of money you may need in orderto be hedged at a high level of confidence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a1-in-1000 (quantile .999) years worst possible loss event.
It is necessary to calculate the distribution of the losses, and the methodologyused has to take in consideration challenges related with size of the data sets,bimodality, heavy tails, dependence, between others.
We propose to model the total losses by maximizing an entropy measure.
4 / 48
IntroductionMotivation
Banks developed a conceptual framework to characterize and quantify risk, toput money aside to cover large-scale losses and to ensure the stability of thefinancial system.
In this sense, a similar problem appears also in Insurance, to set premiums andoptimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. InOperational risk the size of the historical data is small. So, the results may varywidely.
More precisely, we are interested in the calculation of regulatory/economiccapital using advanced models (LDA: loss distribution approach) allowed byBasel II.
The problem is the calculation of the amount of money you may need in orderto be hedged at a high level of confidence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a1-in-1000 (quantile .999) years worst possible loss event.
It is necessary to calculate the distribution of the losses, and the methodologyused has to take in consideration challenges related with size of the data sets,bimodality, heavy tails, dependence, between others.
We propose to model the total losses by maximizing an entropy measure.
4 / 48
IntroductionMotivation
Banks developed a conceptual framework to characterize and quantify risk, toput money aside to cover large-scale losses and to ensure the stability of thefinancial system.
In this sense, a similar problem appears also in Insurance, to set premiums andoptimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. InOperational risk the size of the historical data is small. So, the results may varywidely.
More precisely, we are interested in the calculation of regulatory/economiccapital using advanced models (LDA: loss distribution approach) allowed byBasel II.
The problem is the calculation of the amount of money you may need in orderto be hedged at a high level of confidence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a1-in-1000 (quantile .999) years worst possible loss event.
It is necessary to calculate the distribution of the losses, and the methodologyused has to take in consideration challenges related with size of the data sets,bimodality, heavy tails, dependence, between others.
We propose to model the total losses by maximizing an entropy measure.
4 / 48
IntroductionMotivation
Banks developed a conceptual framework to characterize and quantify risk, toput money aside to cover large-scale losses and to ensure the stability of thefinancial system.
In this sense, a similar problem appears also in Insurance, to set premiums andoptimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. InOperational risk the size of the historical data is small. So, the results may varywidely.
More precisely, we are interested in the calculation of regulatory/economiccapital using advanced models (LDA: loss distribution approach) allowed byBasel II.
The problem is the calculation of the amount of money you may need in orderto be hedged at a high level of confidence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a1-in-1000 (quantile .999) years worst possible loss event.
It is necessary to calculate the distribution of the losses, and the methodologyused has to take in consideration challenges related with size of the data sets,bimodality, heavy tails, dependence, between others.
We propose to model the total losses by maximizing an entropy measure.
4 / 48
IntroductionMotivation
Banks developed a conceptual framework to characterize and quantify risk, toput money aside to cover large-scale losses and to ensure the stability of thefinancial system.
In this sense, a similar problem appears also in Insurance, to set premiums andoptimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. InOperational risk the size of the historical data is small. So, the results may varywidely.
More precisely, we are interested in the calculation of regulatory/economiccapital using advanced models (LDA: loss distribution approach) allowed byBasel II.
The problem is the calculation of the amount of money you may need in orderto be hedged at a high level of confidence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a1-in-1000 (quantile .999) years worst possible loss event.
It is necessary to calculate the distribution of the losses, and the methodologyused has to take in consideration challenges related with size of the data sets,bimodality, heavy tails, dependence, between others.
We propose to model the total losses by maximizing an entropy measure.
4 / 48
IntroductionMotivation
Banks developed a conceptual framework to characterize and quantify risk, toput money aside to cover large-scale losses and to ensure the stability of thefinancial system.
In this sense, a similar problem appears also in Insurance, to set premiums andoptimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. InOperational risk the size of the historical data is small. So, the results may varywidely.
More precisely, we are interested in the calculation of regulatory/economiccapital using advanced models (LDA: loss distribution approach) allowed byBasel II.
The problem is the calculation of the amount of money you may need in orderto be hedged at a high level of confidence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a1-in-1000 (quantile .999) years worst possible loss event.
It is necessary to calculate the distribution of the losses, and the methodologyused has to take in consideration challenges related with size of the data sets,bimodality, heavy tails, dependence, between others.
We propose to model the total losses by maximizing an entropy measure.
4 / 48
IntroductionMotivation
Banks developed a conceptual framework to characterize and quantify risk, toput money aside to cover large-scale losses and to ensure the stability of thefinancial system.
In this sense, a similar problem appears also in Insurance, to set premiums andoptimal reinsurance levels.
The differences between these two sectors lies in the availability of the data. InOperational risk the size of the historical data is small. So, the results may varywidely.
More precisely, we are interested in the calculation of regulatory/economiccapital using advanced models (LDA: loss distribution approach) allowed byBasel II.
The problem is the calculation of the amount of money you may need in orderto be hedged at a high level of confidence (VaR: 99,9%).
The regulation states that the allocated capital charge should correspond to a1-in-1000 (quantile .999) years worst possible loss event.
It is necessary to calculate the distribution of the losses, and the methodologyused has to take in consideration challenges related with size of the data sets,bimodality, heavy tails, dependence, between others.
We propose to model the total losses by maximizing an entropy measure.
4 / 48
IntroductionLoss distribution Approach (LDA): Univariate Case
Operational risk has to do withlosses due failures in processes,technology, people, etc.
Two variables play a role in operational risk:
Severity (X ): Lognormal, gamma,weibull distributions, Subexponentialdistributions...
Frequency (N): Poisson, negativebinomial , binomial distributions.
S = X1 + X2 + ...+ XN =N∑
n=1
Xn
where S represents the aggregate claimamount in a fixed time period (typically oneyear) per risk event.
Approach used: Fit parametric distributionsto N and X and obtain fS through recursivemodels or convolutionsNo single distribution fits well over the en-tire data set.
5 / 48
IntroductionLoss distribution Approach (LDA): Multivariate Case
Then, each of these loss distributions are further summed among all types of risk, toarrive at the total aggregate loss.
(S1, ...,Sm) = (
N1∑i=1
X1i, ...,
Nm∑i=1
Xmi)
ST =m∑i=1
Si = S1 + S2 + ...+ Sm
where b = 1, . . . , 8(business lines), l = 1, . . . , 7(event types),m = 8× 7, types of risks in Operational Risk
Dependence structure between risks Si , i.e. choice of a copula model.
6 / 48
IntroductionLoss distribution Approach (LDA): Illustrative Example
Loss distribution Approach (LDA): Illustrative Example
Estimate parametric distributions for the Frequency N and Severity X for eachindividual risk (Maximum-Likelihood Estimation - MLE).
Compound the distributions (Panjer, Convolutions, Fourier, ...). Then, we havefSi (Univariate Case)
Then, the density fST of the sum
ST = S1 + ...+ SB
(Multivariate case) can be obtained by a sequential convolution procedure:
1 Derive the distribution of the sum of a pair of values S1 + S2 from thejoint density fS1,S2
(s1, s2) = fS1(s1)fS2
(s2)c(s1, s2), where C is the copulamodel .
2 Apply the convolution integralfS1+S2
=∫s1fS1,S2
(s1, l12 − s1) =∫s2fS1,S2
(s2, l12 − s2)
steps (1) and (2) are repeated for the rest of the sum.
7 / 48
IntroductionIllustrative Example
200 samples of size 10Aggregation: 7 independent types of risks
What happen when the data is scarce, asin common in banking?
Problems
Parameter uncertainty.
Bad fit in the tails.
Scarcity of data, impossibility to fittails and body separately.
Underestimation of the regulatorycapital charge.
This methodology gives a bad fiteven when re-sampling is analternative.
8 / 48
Maximum entropy approachIllustrative example - size and tail concern
Parametric Approach
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.00
12
34
Losses
density
True
AVERAGE
Reconstructions
Maxentropic Approach
Maxentropic methodologies provides a density reconstruction over the entirerange of values.
9 / 48
Maximum entropy approachIllustrative example - bimodal concern
(1)(2)
Error (1) (2)
MAE 0.02652 0.01291RMSE 0.03286 0.01647
Table: Errors.
Maxentropic is able to model asymmetries.
10 / 48
Maximum entropy approachDependencies
We use maxentropic methodologies to model dependencies between differenttypes of risks in the framework of Operational risk
11 / 48
Maximum entropy approach
Find a probability distribution P on some measurespace (Ω,F), which is absolutely continuous respectto some (usually σ-finite) measure on Q and on(Ω,F)
maxP
HQ(P) = −∫
Ωρ(ξ)lnρ(ξ)dQ(ξ)
satisfying
P << Q such that EP [AX ] = Y
∫Ω ρ(ξ)dQ(ξ) = 1
This method consist in to find the probability measure whichbest represent the current state of knowledge which is the one
with the largest information theoretical entropy.
12 / 48
Maximum entropy approachJaynes, 1957
This concept was used by Jaynes (1957) for first time as a methodof statistical inference in the case of a under-determined problem.
For example:We rolled 1000 times, a six-sided die comes up with an average of4.7 dots. We want to estimate, as best we can, the probabilitydistribution of the faces.
There are infinitely many 6-tuples (p1, ..., p6) with pi ≥ 0,∑ipi = 1 and
∑iipi = 4.7
13 / 48
Maximum Entropy ApproachGeneral overview
The essence of the Maxentropic method consist in transform a problem of the type
AX = Y X : Ω→ C
into a problem of optimization, by maximizing the entropy measure.Where C is a constraint set of possible reconstructions.Then, we have a
Unique and robust solution
14 / 48
Laplace Transform
In probability theory and statistics, the Laplace transform is defined as expectation ofa random variable.
ψ(α) = E [e−αS ] =
∫ ∞0
e−αsdFS (s) S ∈ IR+
If any two continuous functions have the same Laplace transform, then thosefunctions must be identical.
The Laplace transforms of some pdf’s are not easy to invert and there is nota completely general method which works equally well for all possibletransforms.
15 / 48
Laplace Transform
In probability theory and statistics, the Laplace transform is defined as expectation ofa random variable.
ψ(α) = E [e−αS ] =
∫ ∞0
e−αsdFS (s) S ∈ IR+
All the information about the problem can be compressed in a set of momentsobtained from the Laplace transform, through a change of variables.
ψ(α) = E [e−αS ] = E [Yα] =
∫ 1
0YαdFY (y) with Y = e−St and Y ∈ (0, 1)
The selection of those moments should be in a way that we use only those thatare the more relevant or informative (Lin-1992 and Entropy ConvergenceTheorem).
16 / 48
Laplace Transform
We want to model fS with S > 0.
When N = 0⇒ S = 0 and we rewrite the Laplace transform as
ψ(α) = E [e−αS ] = P(S = 0) · E [e−αS |S = 0] + P(S > 0) · E [e−αS |S > 0]
ψ(α) = E [e−αS ] = P(N = 0) · E [e−αS |N = 0] + P(N > 0) · E [e−αS |N > 0]
where P(S = 0) = P(N = 0) = po , then
ψ(α) = E [e−αS ] = po · 1 + (1− po) · E [e−αS |N > 0]
µ(α) = E [e−αS |N > 0] =ψ(α)− po
1− po
ψ(α) and po has to be estimated from the data.
17 / 48
Input of the MethodologyUnivariate Case
Thus, the problem becomes to determine fS from the integral constraint after achange of variables,
E [e−αS |S > 0] =
∫ 1
0yαj fY (y)dy = µ(αj ), j = 0, ...,K .
Analytical form
ψ(αk ) = E(e−αSt ) =∑∞
n=0(ϕX (t))kpn = G(φX (αk )) with αk = α0/k
Numerical form
ψ(αk ) =1
T
T∑i=1
eαk si with αk = α0/k
where
α0 = 1.5 : fractional value, k = 1, ...,K optimal number of moments.φX (αk ): Laplace transform of X , αk ∈ R+
G(·): probability generation function of the frequenciesψ(αk ): Laplace transform of the total lossesT : sample size.
18 / 48
Input of the MethodologyUnivariate Case
Thus, the problem becomes to determine fS from the integral constraint after achange of variables,
E [e−αS |S > 0] =
∫ 1
0yαj fY (y)dy = µ(αj ), j = 0, ...,K .
Analytical formFit parametrically the frequency and severity distributions and calculate the Laplacetransform through the probability generation function.Poisson-Gamma
ψ(αk ) = exp(−`(1− ba(αk + b)−a))with αk = α0/k
The quality of the results is linked to how well the data fit to the defineddistributions.
It is not possible to find a closed form of ψ(αk ) for some pdf’s. This isparticular true for long tail pdf’s, as for example the lognormal distribution.
19 / 48
Input of the MethodologyUnivariate Case
Thus, the problem becomes to determine fS from the integral constraint after achange of variables,
E [e−αS |S > 0] =
∫ 1
0yαj fY (y)dy = µ(αj ), j = 0, ...,K .
Analytical form
ψ(αk ) = E(e−αSt ) =∑∞
n=0(ϕX (t))kpn = G(φX (αk )) with αk = α0/k
Numerical form
ψ(αk ) =1
T
T∑i=1
eαk si with αk = α0/k
where
α0 = 1.5 : fractional value, k = 1, ...,K optimal number of moments.φX (αk ): Laplace transform of X , αk ∈ R+
G(·): probability generation function of the frequenciesψ(αk ): Laplace transform of the total lossesT : sample size.
20 / 48
Input of the MethodologyMultivariate Case Dependencies
(1) We can aggregate dependencies to our input, knowing each fSi
ψ(α) = E [e−αk (S1+S2+...+SB )] =
=
N−1∑i=1
e−(s1i+s2i+...+sBi )αk f (s1i , s2i , ..., sBi )∆s1∆s2...∆sB
where N is the number of partitions used in the discretization and
f (s1, s2, ..., sB) = c[F1(s1), ...,F1(sB)]B∏i=1
fSi (xi )
the joint distribution, c is the density of the copula model C , and fS1, ..., fSB are
marginal densities.
(2) Simply ψ(α) = 1T
T∑i=1
e−αk (s1i+s2i+...+sBi )
where T is the sample size.
21 / 48
Maximum Entropy Methods
max H(f ) = −∫ 1
0fY (y)lnfY (y)dy
SME approach. Find the probability density on [0,1]
ψ(α) =
∫ 1
0yαk f (y)dy = µ(αk ) with Y = e−S
where µ = ψ(αk )−P(N=0)1−P(N=0)
MEM approach: Extension of the SME approach, allows to include a referencemeasure Q, which is a parametric distribution.SMEE approach: Extension of the SME approach when we assume that thedata has noise. ∫ 1
0yαk f (y)dy ∈ Ck = [ak , bk ] with Y = e−S
These methods consist in to find the probability measure which best representthe current state of knowledge which is the one with the largest information
theoretical entropy.22 / 48
Standard Maximum Entropy Method (SME)
In general, the maximum entropy density is obtained by maximizing the entropymeasure
Max H(f ) = −∫ 1
0fY (y)lnfY (y)dy
satisfying
E(yαk ) =
∫ 1
0yαk fY (y)dy = µαk , k = 1, 2, ...,K with K = 8
∫ 10 fY (y)dy = 1
whereµk : k-th moment, which it is positive and it is a known valueK = 8: number of momentsFractional value: αk = α0/k, α0 = 1.5
23 / 48
Standard Maximum Entropy Method (SME)When the problem has a solution can be expressed in terms of the Lagrangemultipliers as
f ∗Y (y) =1
Z(λ)exp
(−
K∑k=1
λkyαk
)= exp
(−
K∑k=0
λkyαk
)the normalization constant is determined by
Z(λ) =
∫Ωexp
(−
K∑k=1
λkyαk
)dy
Then it is necessary to find λ∗, that is the minimizer of the dual entropy that isin function of the Lagrange multipliers λ and is given by
H(λ) = lnZ(λ)+ < λ, µ >=∑
(λ, µ)
Basically it is a problem of minimizing a convex function and there have to re-duce the step size as it progresses (Barzilai and Borwein non-monotone gradientmethod)
f ∗Y (y) =1
Z(λ∗)exp
(−
K∑k=1
λ∗k yαk
)Y ∈ (0, 1)
24 / 48
Standard Maximum Entropy Method (SME)1 The starting point is
∫∞0 e−αsdFS (s) = µk , S ∈ (0,∞)
2 Make a change of variables, setting Y = e−S(N), Y ∈ (0, 1),
3 Find a minimum of the dual entropy that is in function of λ,
minλ
∑(λ, µ) = lnZ(λ)+ < λ, µ >
where
Z(λ) =
∫ 1
0e−
∑Kk=1 λk y
αkdy .
4 The solution
f ∗Y (y) =1
Z(λ∗)e−
∑Kk=1 λ
∗k y
αk= e−
∑Kk=0 λ
∗k y
αkY ∈ (0, 1)
5 Return the change of variables
f ∗S (s) = e−s f ∗Y (e−s), S ∈ (0,∞)
25 / 48
Extensions of the SME Approach: SMEE
Remember that µk = E [e−αS ] was estimated from observedvalues s1, ...., sN of S . But, there is some measurement error ε
Approach I: The input µ is an interval Ck = [ak , bk ].
Find f ∗S such that∫
Ω e−αkSdFS(s) ∈ Ck
Approach II: We have two inputs: µk and an interval for theerrors [ak , bk ], centered on zero.
Find f ∗S and p∗k such that∫Ωe−αkSdFS(s) + pkak + (1− pk)bk = µk
26 / 48
Extensions of the SME Approach (SMEE)
Approach II: We have two inputs, µk and an interval forthe errors [ak , bk ], centered on zero.
µk = E [e−αS ] + ε, where ε ∈ [ak , bk ]
max
H(f ,p) = −∫ 1
0f (y) ln(f (y))dy−
K∑k=1
(pk ln pk + (1− pk) ln(1− pk))
such that∫ 10 yαk fY (y)dy + pkak + (1− pk)bk = µk
0 < pk < 1,∫ 1
0 fY (y)dy = 1k = 1, ...,K with K = 8.
27 / 48
The solution can be expressed in terms of the Lagrange multipliers
f ∗(y) = e−∑K
k=1 λk yαk
Z(λ)
p∗k = e−akλk
e−akλk +e−bkλk.
Here, the normalization factor Z (λ) is as above. The vector λ∗ ofLagrange multipliers is to be found minimizing the dual entropy
H(λ) = lnZ (λ) +K∑
k=1
ln(e−akλk + e−bkλk
)+ < λ,µ >= Σ(λ)
Once λ∗ is found, the estimator of the measurement error is, givenby
εk =ake−akλ∗k + bke
−bkλ∗k
e−akλ∗k + e−bkλ
∗k
.
f ∗(y) = e−∑K
k=1 λ∗k yαk
Z(λ∗), p∗k = e−akλ
∗k
e−akλ
∗k +e
−bkλ∗k
28 / 48
Numerical Results
1 Two Maxentropic Approaches to determine the probability density of compoundlosses.
2 Density Reconstructions with Errors in the Data.
3 Maxentropic approach to decompound aggregate risk losses.
4 Loss data analysis: Analysis of the Sample Dependence in DensityReconstruction by Maxentropic Methods.
5 Maximum entropy approach to the loss data aggregation problem.
29 / 48
Numerical Results
To test the methodology we consider different combinations of frequencies andseverity losses.
We use a sample large enough for not to worry about the effect of the size inthe results.
We use several methods to verify the quality of the results: L1 & L2 distances,MAE & RMSE distances, visual comparisons, and goodness of fit tests.
MAE =1
T
T∑n=1
|F (xn)− Fe(xn)|
RMSE =
√√√√ 1
T
T∑n=1
(F (xn)− Fe(xn)
)2
RMSE is more sensitive to outliers, because this measure gives a relatively highweight to large errors. So, the greater the difference between MAE and RMSE,the greater the variance of the individual errors in the sample.
30 / 48
Simulation details
To test the methodology we consider different combinations of fre-quencies and loss distributions.
Sbh Nbh Xbh
S1: POISSON (λ = 80) CHAMPERNOWNE (α = 20, M = 85, c = 15)S2: POISSON (λ = 60) LogNormal (µ = -0.01, σ = 2)S3: BINOMIAL(n = 70, p = 0.5) PARETO( shape = 10, scale = 85)S4: BINOMIAL (n = 62, p = 0.5) CHAMPERNOWNE (α = 10, M = 125, c = 45)S5: BINOMIAL (n = 50, p = 0.5) GAMMA(shape = 4500, rate = 15)S6: BINOMIAL (n = 76, p = 0.5) GAMMA(shape = 9000, rate = 35)S7: NEGATIVE BINOMIAL (r = 80, p = 0.3) WEIBULL(shape = 200, scale = 50)
Tail: NEGATIVE BINOMIAL (r = 90, p = 0.8) PARETO(shape = 5.5, scale = 5550)
Table: Inputs for the simulation of S
All the risks are independent.
31 / 48
Results
Approach MAE RMSESMEE 0.005928 0.006836SME 0.006395 0.009399
Table: MAE and RMSE for a sample size of5000
MAE =1
T
T∑n=1
|F (xn)− Fe(xn)|
RMSE =
√√√√ 1
T
T∑n=1
(F (xn)− Fe(xn)
)2
32 / 48
SME Results
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
01
23
4
Losses
de
nsity
True
AVERAGE
Reconstructions
(a) 10
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.0
0.5
1.0
1.5
2.0
2.5
Losses
de
nsity
True
AVERAGE
Reconstructions
(b) 20
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.0
0.5
1.0
1.5
2.0
Losses
de
nsity
True
AVERAGE
Reconstructions
(c) 50
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.0
0.5
1.0
1.5
2.0
Losses
de
nsity
True
AVERAGE
Reconstructions
(d) 100
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.0
0.5
1.0
1.5
Losses
de
nsity
True
AVERAGE
Reconstructions
(e) 500
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.0
0.5
1.0
1.5
Losses
de
nsity
True
AVERAGE
Reconstructions
(f) 1000
Figure: SME reconstructions for different sample sizes. Tolerance: 1× 106
34 / 48
SMEE Results
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
01
23
Losses
de
nsity
True
AVERAGE
Reconstructions
(a) 10
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.0
0.5
1.0
1.5
2.0
2.5
Losses
de
nsity
True
AVERAGE
Reconstructions
(b) 20
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.0
0.5
1.0
1.5
2.0
Losses
de
nsity
True
AVERAGE
Reconstructions
(c) 50
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.0
0.5
1.0
1.5
2.0
Losses
de
nsity
True
AVERAGE
Reconstructions
(d) 100
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.0
0.5
1.0
1.5
Losses
de
nsity
True
AVERAGE
Reconstructions
(e) 500
3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
0.0
0.5
1.0
1.5
Losses
de
nsity
True
AVERAGE
Reconstructions
(f) 1000
Figure: SMEE reconstructions for different sample sizes. Tolerance:1× 106
35 / 48
SME Results & SMEE ResultsSize Error Mean (SME) Mean (SMEE) sd (SME) sd (SMEE)
10MAE 0.0880 0.0690 0.0472 0.0299
RMSE 0.1010 0.0784 0.0527 0.0324
20MAE 0.0619 0.0620 0.0339 0.0324
RMSE 0.0702 0.0705 0.0374 0.0378
50MAE 0.0377 0.0378 0.0213 0.0215
RMSE 0.0429 0.0430 0.0234 0.0237
100MAE 0.0266 0.0267 0.0158 0.0157
RMSE 0.0304 0.0306 0.0172 0.0173
200MAE 0.0194 0.0196 0.0099 0.0099
RMSE 0.0225 0.0229 0.0111 0.0111
500MAE 0.0128 0.0131 0.0063 0.0063
RMSE 0.0153 0.0156 0.0069 0.0069
1000MAE 0.0093 0.0093 0.0035 0.0035
RMSE 0.0115 0.114 0.0039 0.0040
Table: SME & SMEE results for different sample sizes. Tolerance:1× 106
36 / 48
SME Results & SMEE ResultsSize Area (SME) Area (SMEE) AVE.(SME) AVE.(SMEE)
10 2.625 2.6190.0092 0.00690.0120 0.0110
20 1.523 1.7590.0082 0.00660.0116 0.0109
50 0.955 1.0440.0082 0.00650.0106 0.0102
100 0.696 0.6900.0053 0.00600.0066 0.0082
200 0.538 0.5520.0053 0.00630.0067 0.0072
500 0.326 0.2940.0055 0.00580.0076 0.0083
1000 0.203 0.2000.0054 0.00570.0078 0.0082
Table: SME & SMEE results for different sample sizes. Tolerance:1× 106
37 / 48
Computation of the regulatory capitalγ Empirical SME SMEE
VaR0.950 5.05 4.935 5.0040.990 5.72 5.755 5.772
TVaR0.950 5.45 5.443 5.4610.990 6.05 6.0207 6.014
Table: Comparison of VaR and TVaR at 95% and 99% for a uniquesample of size 5000
SizeVaR(95%) TVaR(95%) VaR(99%) TVaR(99%)
SME SMEE SME SMEE SME SMEE SME SMEE10 4.96 4.87 5.30 5.156 4.331 5.283 4.328 5.634
100 4.96 4.931 5.44 5.43 5.457 5.779 5.694 6.016500 4.95 4.93 5.45 5.45 5.708 5.822 5.972 6.017
1000 4.95 4.95 5.45 5.45 5.729 5.828 5.977 6.064
Table: Mean and standard deviation of the VaR and TVaR for 200samples of different sizes
38 / 48
Conclusions
In this work we present an application of the maxentropic methodologies toOperational Risk. We showed that this methodology can provide a good densityreconstruction over the entire range, in the case of scarcity, heavy tails andasymmetries, using only eight moments as input of the methodology.
This methodology gives the possibility to obtain the density distributions fromdifferent levels of aggregation and allow us to include dependencies betweendifferent types of risks.
We can joint marginal densities obtained from any methodology and givethem any relation orwe can obtain the joint distribution directly from the data and avoid badestimations
The estimation of the underlying loss process provides a starting point to designpolicies, set premiums and reserves, calculate optimal reinsurance levels andcalculate risk pressures for solvency purposes in insurance and risk management.Also, this is useful in structural engineering to describe the accumulated damageof a structure, just to mention one more possible application.
39 / 48
Conclusions
1 Two Maxentropic Approaches to determine the probability density of compoundlosses.
2 Density Reconstructions with Errors in the Data.
3 Maxentropic approach to decompound aggregate risk losses.
4 Loss data analysis: Analysis of the Sample Dependence in DensityReconstruction by Maxentropic Methods.
5 Maximum entropy approach to the loss data aggregation problem.
40 / 48
Conclusions
Here, we work with four different approximations ofMaxentropic methodologies(SME/MEM/SMEE(ap1)/SMEE(ap2)), two of those allowsto include the possibility of consider a little of uncertainty inthe input, using a interval for the moments instead of thesample estimation. Additionally, the maximum entropy in themean (MEM) allows to aggregate to the estimation areference measure to improve the results.
In general the SMEE method improves the quality of theresults in terms of convergence and number of iterations.
This methodology can be used to estimate the distributions ofthe losses fX in the first level of aggregation, when we havethe distribution of aggregated losses fS and the distribution ofthe frequency pn.
41 / 48
Maximum Entropy Method - General overview
The essence of the Maxentropic method consist in transform a problem of the type
AX = Y X : Ω→ C
into a problem of convex optimization, by maximizing the entropy measure.WhereC is a constraint set of possible reconstructions, as values of a r.v. in X Ω is a samplespaceThen, we have a
Unique and robust solution
Among those x’s ∈ C yielding similar reconstruction error, choose one with anorm as small as possible.
43 / 48
Maximum Entropy Method - General overview
Find a probability distribution P on some measure space (Ω,F), which is absolutelycontinuous respect to some (usually σ-finite) measure on Q and on (Ω,F)
maxP
SQ(P) = −∫
Ωρ(ξ)lnρ(ξ)dQ(ξ) = −
∫Ω
dP
dQln
(dP
dQ
)dQ = −
∫Ωln
(dP
dQ
)dP
(1)satisfying
P << Q such that EP [AX ] = Y (2)
whereQ: is the reference measure which reflects the information that we have.EP [AX ] = AEP [X ] = −
∫Ω AξdP(ξ) = −
∫Ω Aξρ(ξ)dξ = y∫
Ω ρ(ξ)dQ(ξ) = 1dP(ξ) = ρ(ξ)dQ(ξ) dQ(ξ) = qdξNote that if such measure P is found, them
xj = Ep [Xj ]
44 / 48
We introduce Lagrangian multipliers λ to obtain the result
dP(λ) =exp(− < λ,Aξ >)
Z(λ)dQ(ξ)
where the normalization constant is determined by
Z(λ) =
∫Ωe−<λ,Aξ>)dQ(ξ) (3)
Then it is necessary to find λ∗, that is the minimizer of the dual entropy that isin function of λ and is given by
inf (SQ(λ)) = −sup(−SQ(P))
SQ(λ) = −∫
Ω
exp(− < λ,Aξ >)
Z(λ)lnexp(− < λ,Aξ >)
Z(λ)dQ(ξ)
minλ
∑(λ, y) = lnZ(λ)+ < λ, y >
45 / 48
Basically it is a problem of minimizing a convex function that is very flat andthere have to reduce the step size as it progresses
Minλ
∑(λ, y) = Min
λlnZ(λ)+ < λ, y > (4)
Z(λ) =
∫Ωe−<λ,Aξ>)dQ(ξ) (5)
Barzilai and Borwein optimization method (BB-method)
Optimal solution:
ρ∗(ξ) =exp(− < λ∗,Aξ >)
Z(λ∗)
and
x∗j =exp(−Atλ∗)
z(λ∗), with z(λ∗) = exp(−λo)
46 / 48
Measures of Reference
The choice of the measure Q is up to the modeler, and it may thought of as a firstguess of the unknown distribution.
SQ(P) = −∫
Ωρ(ξ)lnρ(ξ)dQ(ξ)
SQ(λ) =∑
(λ, µ) = lnZ(λ)+ < λ, µ >
Z(λ) = EQ [e−<λ,µ>)] =
∫Ωe−<λ,µ>)dQ(ξ)
Uniform distribution ξ ∼ U(0, 1), dQ(ξ) = dξ ⇒ SME
47 / 48
Lin (1992). Characterization of distributions via moments.
Theorem (Lin 1) Let Fy be the distribution of a positive randomvariable Y . Let αn be a sequence of positive and distinct numbers in(0,A) for some A, satisfying limn→∞αn = αo < A. If E [Y A] <∞,the sequence of moments E [Y αn ] characterizes Fy .
Theorem (Lin 2) Let Fy be the distribution of a positive randomvariable Y . Let αn be a sequence of positive and distinct numberssatisfying limn→∞αn = 0 and
∑n≥1 αn = ∞. Then the sequence
of moments E [Y αn ] characterizes Fy .
Both results hinge on the fact that an analytic function is determinedfrom a collection of its values on a countable set having an accu-mulation point in its domain of analyticity. The connection betweenthe two explained in Lin’s paper.
48 / 48