stt 459: actuarial models - michigan state...

STT 459: Actuarial Models

Albert Cohen

Actuarial Sciences ProgramDepartment of Mathematics

Department of Statistics and ProbabilityC336 Wells Hall

Michigan State UniversityEast Lansing MI

[email protected]@stt.msu.edu

Albert Cohen (MSU) STT 459: Actuarial Models MSU Spring 2015 1 / 193

Copyright Acknowledgement

Some examples and theorem proofs in these slides, and on in class exampreparation slides, are taken from our textbook ”Loss Models: From Datato Decisions” 4th edition ,by Klugman, Panjer, and Wilmott.

Please note that Wiley owns the copyright for that material. Noportion of the Wiley textbook material may be reproduced in anypart or by any means without the permission of the publisher. Weare very thankful to the publisher for allowing posting of thesenotes on our class website.

Also, we will from time-to-time look at problems from releasedprevious Exams C by the SOA. All such questions belong incopyright to the Society of Actuaries, and we make no claim onthem. It is of course an honor to be able to present analysis of suchexamples here.


Random Variables

Recall our notion of a sample space S = Ω, and attach to it a probabilitymeasure P and the set of possible events F .

In our previous experience, F = 2Ω.For now, we take this to be the case. We can define, on this ProbabilitySpace (Ω,F ,P) a Random Variable X as any function

X : Ω→ R (1)

Consider now the experiment of tossing 3 fair coins. Define the randomvariable Y = Number of heads that appear. Then

P[Y = 0] = P[(T ,T ,T )]P[Y = 1] = P[(T ,T ,H), (T ,H,T ), (H,T ,T )]P[Y = 2] = P[(T ,H,H), (H,T ,H), (H,H,T )]P[Y = 3] = P[(H,H,H)]


Random Variables

Recall our notion of a sample space S = Ω, and attach to it a probabilitymeasure P and the set of possible events F .In our previous experience, F = 2Ω.

For now, we take this to be the case. We can define, on this ProbabilitySpace (Ω,F ,P) a Random Variable X as any function

X : Ω→ R (1)




Random Variables

Recall our notion of a sample space S = Ω, and attach to it a probabilitymeasure P and the set of possible events F .In our previous experience, F = 2Ω.For now, we take this to be the case. We can define, on this ProbabilitySpace (Ω,F ,P) a Random Variable X as any function

X : Ω→ R (1)




Discrete Random Variables

Consider a random variable X that takes on a countable number of values,denoted by the set R ≡ x1, x2, .... For any a ∈ R, we define

p(a) = P[X = a]∞∑i=1

p(xi ) = 1(2)


Expected Value

For a random variable X that takes on a countable number of values,define

E[X ] =∑xi∈R

xip(xi ) (3)

As an important example, consider an event A in our σ−field and definethe random variable

IA =

1, if A occurs

0, if A does not occur(4)

Then E[IA] = P[A].


Variance

If X is a random variable with mean µ = E[X ], then the variance Var(X )and standard deviation SD(X ) are defined as

Var(X ) = E[(X − µ)2

]

= E[X 2]− E[X ]2

SD(X ) =√

Var(X )

(5)

A useful identity is

Var(a · X + b) = a2 · Var(X ) (6)

Also notice that E[X 2] ≥ (E[X ])2. This is true regardless of thedistribution of X .


Variance

If X is a random variable with mean µ = E[X ], then the variance Var(X )and standard deviation SD(X ) are defined as

Var(X ) = E[(X − µ)2

]= E

[X 2]− E[X ]2

SD(X ) =√

Var(X )

(5)

A useful identity is

Var(a · X + b) = a2 · Var(X ) (6)

Also notice that E[X 2] ≥ (E[X ])2. This is true regardless of thedistribution of X .


Bernoulli and Binomial Random Variables

Define a Bernoulli random variable X with R = 0, 1 and

p = p(1) = P [X = 1]

1− p = p(0) = P [X = 0](7)

This represents the outcome of 1 trial. Now, imagine this experiment iscarried out n times. Let Yn,p denote the outcome of this repeatedexperiment. We now have R = 0, 1, 2, .., n and

p(i) = P[Yn,p = i ] =

(n

i

)· pi · (1− p)n−i

E[Yn,p] = n · pVar(Yn,p) = n · p · (1− p)

(8)


The Poisson Random Variable

Recall our r.v. X with p(i) = e−λλi

i! . Straigthforward computation showsthat E[X ] = λ = Var(X )

X can in fact approximate Yn,p if λ = n · p, for large n.

As an example, consider an item that is produced by a certain machine.The probability that it will be defective is p = 0.1. The probability that, ina sample of 10, at most one item will be defective, is

P[Y10,0.1 ≤ 1] =

(10

0

)0.10 · 0.910 +

(10

1

)0.11 · 0.99

= 0.7361 ≈ 0.7358 = e−1 + e−1

= e−(10·0.1) · (10 · 0.1)0

1+ e−(10·0.1) · (10 · 0.1)1

1= P[X = 0] + P[X = 1]

(9)


Other Discrete Probability Distributions

Geometric Random Variable: Number of trials required until success :P[X = n] = (1− p)n−1 · p

E[X ] = p · 1 + (1− p) · E[1 + X ]⇒ E[X ] = 1p

E[X 2] = p · 12 + (1− p) · E[(X + 1)2]⇒ E[X 2] = 2−pp2

Var(X ) = E[X 2]− (E[X ])2 = 1−pp2

Negative Binomial Random Variable: n trials needed for r successes :P[Xr ,p = n] =

(n−1r−1

)pr · (1− p)n−r

E[Xr ,p] = rp

E[X 2r ,p] = r

p ·(

r+1p − 1

)Var(Xr ,p) = E[X 2

r ,p]− (E[Xr ,p])2 = r ·(1−p)p2




E[X ] = p · 1 + (1− p) · E[1 + X ]⇒ E[X ] = 1p

E[X 2] = p · 12 + (1− p) · E[(X + 1)2]⇒ E[X 2] = 2−pp2

Var(X ) = E[X 2]− (E[X ])2 = 1−pp2


(n−1r−1

)pr · (1− p)n−r

E[Xr ,p] = rp

E[X 2r ,p] = r

p ·(

r+1p − 1

)

Var(Xr ,p) = E[X 2r ,p]− (E[Xr ,p])2 = r ·(1−p)

p2




E[X ] = p · 1 + (1− p) · E[1 + X ]⇒ E[X ] = 1p

E[X 2] = p · 12 + (1− p) · E[(X + 1)2]⇒ E[X 2] = 2−pp2

Var(X ) = E[X 2]− (E[X ])2 = 1−pp2


(n−1r−1

)pr · (1− p)n−r

E[Xr ,p] = rp

E[X 2r ,p] = r

p ·(

r+1p − 1

)Var(Xr ,p) = E[X 2

r ,p]− (E[Xr ,p])2 = r ·(1−p)p2



Hypergeometric Random Variable: sample of size n chosen randomlyfrom population m white balls and N −m black balls. Let Zn,N,m

denote the number of white balls selected :

P[Zn,N,m = i ] =(m

i )(N−mn−i )

(Nn)

for i = 0, 1, .., n

E[Zn,N,m] = n·mN

E[Z 2n,N,m] = n·m

N ·(

(n−1)·(m−1)N−1 + 1

)Var(Zn,N,m) = E[Z 2

n,N,m]− (E[Zn,N,m])2 = n · mN ·(1− m

N

)·(

1− n−1N−1

)Zipf (Pareto) Distribution: Assume α > 0 and that

P[X = k] =

(∑∞l=1

1l1+α

)−1

k1+α . Then E[X ] =?





P[Zn,N,m = i ] =(m

i )(N−mn−i )

(Nn)

for i = 0, 1, .., n

E[Zn,N,m] = n·mN

E[Z 2n,N,m] = n·m

N ·(

(n−1)·(m−1)N−1 + 1


n,N,m]− (E[Zn,N,m])2 = n · mN ·(1− m

N

)·(

1− n−1N−1

)

Zipf (Pareto) Distribution: Assume α > 0 and that

P[X = k] =

(∑∞l=1

1l1+α

)−1






P[Zn,N,m = i ] =(m

i )(N−mn−i )

(Nn)

for i = 0, 1, .., n

E[Zn,N,m] = n·mN

E[Z 2n,N,m] = n·m

N ·(

(n−1)·(m−1)N−1 + 1


n,N,m]− (E[Zn,N,m])2 = n · mN ·(1− m

N

)·(

1− n−1N−1

)Zipf (Pareto) Distribution: Assume α > 0 and that

P[X = k] =

(∑∞l=1

1l1+α

)−1



Expectation is Linear

For a collection of random variables Xini=1, we haveE [∑n

i=1 Xi ] =∑n

i=1 E[Xi ].

As an example, if n die are rolled, what is the expected value for the totalnumber of dots?


CDF

Recall that for an r.v. X , we define F (b) := P[X ≤ b]. It follows from thecontinuity property of probability measures that

If a < b then F (a) < F (b)

limb→∞ F (b) = 1

limb→−∞ F (b) = 0

limbn↓b F (bn) = F (b)

As an example,

P[X < b] = P[

limn→∞

X ≤ b − 1

n

]= lim

n→∞P[

X ≤ b − 1

n

]= lim

n→∞F

(b − 1

n

) (10)


Continuous Random Variables

Recall our notion of a sample space S = Ω, and attach to it a probabilitymeasure P and the set of possible events F .

If we take a random variable that can take on any real value, then we callit a continuous random variable if attached to it there is a non-negativefunction f : R→ R+ where

P[X ∈ B] =

∫B

f (x)dx (11)

for any measurable set B. We can consider all practical sets to bemeasurable. Of course, it follows from our definition that we must haveP[X ∈ R] =

∫∞−∞ f (x)dx = 1. For intervals [a, b] and (−∞, b], we have

P[a ≤ X ≤ b] =

∫ b

af (x)dx

P[X ≤ b] =

∫ b

−∞f (x)dx

(12)


Continuous Random Variables

Recall our notion of a sample space S = Ω, and attach to it a probabilitymeasure P and the set of possible events F .If we take a random variable that can take on any real value, then we callit a continuous random variable if attached to it there is a non-negativefunction f : R→ R+ where

P[X ∈ B] =

∫B

f (x)dx (11)

for any measurable set B. We can consider all practical sets to bemeasurable. Of course, it follows from our definition that we must haveP[X ∈ R] =

∫∞−∞ f (x)dx = 1. For intervals [a, b] and (−∞, b], we have

P[a ≤ X ≤ b] =

∫ b

af (x)dx

P[X ≤ b] =

∫ b

−∞f (x)dx

(12)


Example

Suppose that X is c.r.v. with probability density function

f (x) =

C · (2− x), if 0 < x < 2

0, otherwise .(13)

What is C ? Compute P[X > 1]

Answer: Since f is a density, we require

1 =

∫ ∞−∞

f (x)dx = C ·∫ 2

0(2− x)dx = C · 2⇒ C =

1

2

P[X > 1] =

∫ ∞1

f (x)dx =1

2·∫ 2

1(2− x)dx

=1

2·[

(2(2)− 1

2(22))− (2(1)− 1

2(12))

]=

1

4

(14)


Example


f (x) =

C · (2− x), if 0 < x < 2

0, otherwise .(13)

What is C ? Compute P[X > 1]Answer: Since f is a density, we require

1 =

∫ ∞−∞

f (x)dx = C ·∫ 2

0(2− x)dx = C · 2

⇒ C =1

2

P[X > 1] =

∫ ∞1

f (x)dx =1

2·∫ 2

1(2− x)dx

=1

2·[

(2(2)− 1

2(22))− (2(1)− 1

2(12))

]=

1

4

(14)


Example


f (x) =

C · (2− x), if 0 < x < 2

0, otherwise .(13)

What is C ? Compute P[X > 1]Answer: Since f is a density, we require

1 =

∫ ∞−∞

f (x)dx = C ·∫ 2

0(2− x)dx = C · 2⇒ C =

1

2

P[X > 1] =

∫ ∞1

f (x)dx =1

2·∫ 2

1(2− x)dx

=1

2·[

(2(2)− 1

2(22))− (2(1)− 1

2(12))

]=

1

4

(14)


From c.d.f to p.d.f

Recall

F (y) = P[X ≤ y ] (15)

and so

P[

x − ∆x

2≤ X ≤ x +

∆x

2

]=

∫ x+ ∆x2

x−∆x2

f (u)du ≈ f (x)∆x . (16)

It follows that if Y = 2X , then

FY (x) := P[Y ≤ x ] = P[2X ≤ x ] = P[X ≤ 0.5x ]

= FX (0.5x)

fY (x) :=d

dxFY (x) =

d

dxFX (0.5x) = 0.5fX (0.5x).

(17)


Expectations

Define E[X ] =∫∞−∞ x · f (x)dx for any c.r.v. X with p.d.f. X . For any real

valued function g , we can define the expectation of the transformed c.r.vY = g(X ) as

E[g(X )] =

∫ ∞−∞

g(x) · f (x)dx . (18)


Expectations

The following Lemma is useful in many computations. The proof can beseen via an interchange (Fubini) of the order of integration.

Lemma

For any nonnegative random variable Y , we have E[Y ] =∫∞

0 P[Y > y ]dy .

Proof. ∫ ∞0

P[Y > y ]dy =

∫ ∞0

(∫ ∞y

fY (x)dx

)dy

=

∫ ∞0

(∫ x

0fY (x)dy

)dx

=

∫ ∞0

x · fY (x)dx = E[X ].

(19)


Uniform Random Variables

If we define

f (x) :=

1

B − A, if A < x < B

0, otherwise(20)

then the distribution F is comptuted to be

F (x) =

0, if x ≤ A

x − A

B − A, if A < x < B

1, if x ≥ B

(21)

and so

E [X ] =

∫ ∞−∞

xf (x)dx =

∫ B

Ax · 1

B − A=

B + A

2

Var(X ) = E[X 2]−(

B + A

2

)2

=(B − A)2

12.

(22)


Optimizing Under Uniform Uncertainty

A 1−unit long city block has a mailbox somewhere at a point U that isuniformly distributed over (0, 1). Determine the expected length of thesection of the block that contains an envelope store E , where 0 ≤ E ≤ 1.Also, for what value of E is this expected value maximized ?


Optimizing Under Uniform Uncertainty

Take LE (U) as the length of the sub-block that contains the envelopestore E and note that

LE (U) =

1− U, if U < E

U, if U > E .(23)

It follows that

E [LE (U)] :=

∫ 1

0LE (u) · f (u)du =

∫ 1

0LE (u)du

=

∫ E

0(1− u)du +

∫ 1

Eudu =

1

2+ E (1− E )

max0≤E≤1

E [LE (U)] =1

2+

1

2

(1− 1

2

)=

3

4

(24)


Normal Random Variables

We say that X is a Normal Random Variable with parameters µ, σ2 if

fX (x) =1√2πσ

e−(x−µ)2

2σ2 for −∞ < x <∞ (25)

Furthermore, we say that Z is a Standard Normal Random Variable if it isNormal with parameters µ = 0, σ = 1. Consider

z =x − µσ∫ ∞

−∞fX (x)dx =

∫ ∞−∞

1√2πσ

e−(x−µ)2

2σ2 dx

=

∫ ∞−∞

1√2π

e−(z)2

2 dz

=

∫ ∞−∞

fZ (z)dz

(26)


Normal Random Variables

We user the notation X ∼ N(µ, σ2) and Z ∼ N(0, 1). By ourtransformation above, it can be seen that if Z ∼ N(0, 1), thenX = µ+ σ · Z ∼ N(µ, σ2). We can see this via

Φ(z) := FZ (z) = P[Z ≤ z ]

= P[

X − µσ

≤ x − µσ

]= P[X ≤ x ] =: FX (x)

1

σfZ

(x − µσ

)=

d

dxFZ (z)

=d

dxFX (x) = fX (x)

fZ (z) =1√2π

e−(z)2

2

(27)


DeMoivre-Laplace Limit Theorem

Theorem

If Sn is the number of successes that occur when n independent trials,each resulting in a success with probability p, are performed, then for anya < b, we have

limn→∞

P

[a ≤ Sn − n · p√

n · p · (1− p)≤ b

]= P[a ≤ Z ≤ b]

= Φ(b)− Φ(a)

(28)


Exponential Random Variables

For any λ > 0, define the random variable X with p.d.f.

f (x) =

λe−λx , if x ≥ 0

0, if x < 0.(29)

Then, if x ≥ 0, we have F (x) =∫ x

0 λe−λydy = 1− e−λx .



Furthermore,using integration by parts, we also compute

E[X n] =

∫ ∞0

xnλe−λxdx

= 0 +

∫ ∞0

nxn−1e−λxdx

=n

λ

∫ ∞0

xn−1λe−λx

=n

λE[X n−1]

E[X 1] =1

λE[1] =

1

λ

E[X 2] =2

λE[X ] =

2

λ2

Var(X ) =1

λ2

(30)



Notice that X is a Memoryless random variable:

P[X > s + t | X > t] =P[X > s + t]

P[X > t]

=1− (1− e−λ(t+s))

1− (1− e−λt)

=e−λ(t+s)

e−λt

= e−λs = P[X > s]

(31)


Hazard Rates

Define the Hazard Rate for any crv X as

λ(t) :=f (t)

1− F (t)=

ddt F (t)

1− F (t)(32)

It follows, by solution of the above differential equation, that anequivalent definition is

F (t) = 1− e−∫ t

0 λ(s)ds

F (0) = 1(33)

Crv’s may be defined via their Hazard rates. For example:

Rayleigh distributed random variable: λ(t) = a + btExponential distributed random variable: λ(t) = λ.


Gamma Distribution

Consider a crv X with p.d.f.

f (x) =

λe−λx(λx)α−1

Γ(α), if x ≥ 0

0, if x < 0.

(34)

where the Gamma function Γ(α) is defined as∫∞

0 e−yyα−1dy . Integrationby parts shows that for α > 1

Γ(α) = (α− 1)Γ(α− 1)

Γ(n) = (n − 1)! for n ∈ N(35)


Gamma Distribution

Assume that you are waiting for n events to occur, and that N(t) is thenumber of events that occur up to time t, which is assumed to be Poisson(λt) distributed. Let Tn denote the time at which the nth event occurs.


Gamma Distribution

It follows that

F (t) = P[Tn ≤ t]

= P[N(t) ≥ n]

=∞∑j=n

P[N(t) = j ]

=∞∑j=n

e−λt(λt)j

j!

f (t) = F ′(t) =∞∑j=n

e−λt · j · (λt)j−1 · λj!

−∞∑j=n

λe−λt(λt)j

j!

=λe−λt(λt)n−1

(n − 1)!

(36)


Weibull Distribution


f (x) =

0, if x ≤ ν

β

α

(x − να

)β−1

e−( x−να )

β

, if x > ν(37)

The corresponding cdf is

F (x) =

0, if x ≤ ν

1− e−( x−να )

β

, if x > ν(38)


Cauchy Distribution

Consider a crv θ ∼ U[−π

2 ,π2

], and a transformation to another random

variable X = tan (θ). Then

F (x) = P[X ≤ x ]

= P[tan (θ) ≤ x ]

= P[θ ≤ tan−1 (x)]

=1

π

(tan−1 (x)−

(−π

2

))=

1

2+

1

πtan−1 (x)

⇒ f (x) = F ′(x) =1

π

1

1 + x2

(39)


Beta Distribution


f (x) =

xa−1(1− x)b−1

B(a, b), if 0 < x < 1

0, otherwise.

(40)

where B(a, b) is defined as∫ 1

0 xa−1(1− x)b−1dx . It can be shown that

B(a, b) =Γ(a)Γ(b)

Γ(a + b)

E[X ] =a

a + b

Var(X ) =ab

(a + b)2(a + b + 1)

(41)


Distribution of a Function of a Random Variable

Theorem

Let X be a continuous random variable with p.d.f. fX . Suppose that g(x)is a strictly monotonic, differentiable function of x. Then the randomvariable Y = g(X ) has a p.d.f. defined by

fY (y) =

fX[g−1(y)

]| d

dyg−1(y) |, if, for some x, y = g(x)

0, if, for all x, y 6= g(x)

(42)


Distribution of a Function of a Random Variable

Proof.

Suppose that y = g(x) for some x . Then, with Y = g(X ),

fY (y) =d

dyFY (y)

=d

dyP[g(X ) ≤ y ]

=d

dyP[X ≤ g−1(y)]

=d

dyFX

(g−1(y)

)= fX

(g−1(y)

) d

dyg−1(y)

(43)

If y 6= g(x) for any x , then F (y) is either 0 or 1 ⇒ fY (y) = 0.


Examples (Can we use the previous theorem?)

Let X have p.d.f fX , and define Y = X 2, Z =| X |. Then

fY (y) =d

dyFY (y)

=d

dyP[X 2 ≤ y ]

=d

dyP[−√y ≤ X ≤ √y ]

=d

dy(FX (

√y)− FX (−√y))

=1

2√

y[fX (√

y) + fX (−√y)]

fZ (z) =d

dzP[−z ≤ X ≤ z ]

= fX (z) + fX (−z)

(44)


Mixed Distributions

There are random variables X such that, for a countable number of pointsxi ,

P[X = xi ] = pi > 0∞∑i=0

pi ≤ 1.(45)

As an example, consider X such that

P[X = 0] =1

2

fX (x) =1

2e−x for x > 0.

(46)


HW

Please work through

2.2

2.3, 2.4, 2.5


Raw and Central Moments

Given a random variable X , we define the kth Raw Moment via,

µ′k := E[X k ] (47)

and the kth Central Moment via,

µk := E[(X − µ′1)k ]. (48)

HW: Compute the first 5 raw and central moments for a random variableX with fX (x) = λe−λx for all x ≥ 0.


Variance, Skewness, and Kurtosis

Given a random variable X , we define

µ′1 := µ = Mean

µ2 = E[(X − µ′1)2] := σ2 = Varianceσ

µ:= Coefficient of Variation

γ1 :=µ3

σ3= Skewness

γ2 :=µ4

σ4= Kurtosis

(49)


Excess Loss

Given a random variable X , a value u ∈ R, and a value d ∈ R withP[X > d ] > 0 we define

Y P := X − d = Excess Loss Variable

eX (d) := E[Y P | Y P > 0] = Mean Excess Loss Function

ekX (d) := E[(Y P)k | Y P > 0]

Y L := (X − d)+ = Left Censored and Shifted Variable

Y := X ∧ u = Limited Loss Variable

E[Y ] := Limited Expected Value.

(50)


HW

Please work through

3.2

3.3, 3.4, 3.5

3.6, 3.7, 3.11

3.14, 3.15, 3.16


Percentiles and Moment Generating Functions

Define (any number) πp as the 100pth percentile if it satisfies

F (π−p ) ≤ p ≤ F (πp). (51)

The 50th percentile is also called the median. For all z ∈ R, we define

MX (z) := E[ezX ] = Moment Generating Function

PX (z) := E[zX ] = Probability Generating Function(52)

for which these expectations exist. Note that MX (z) = PX (ez).


Moment Generating Functions

Theorem

Let Sk = X1 + ...+ Xk , where the X are independent. Then, provided thefollowing exist:

MSk (z) = Πkj=1MXj

(z)

PSk (z) = Πkj=1PXj

(z)(53)


Example

Consider a loss model where at time t > 0, the liability with a randomlyselected individual is distributed as

P[Xt ≥ x ] = e−xt for x ≥ 0, t > 0

P[X0 = 0] = 1.(54)

It follows that

MXt (z) = E[ezXt ] =

∫ ∞0

ezx fXt (x)dx =

∫ ∞0

ezx · − d

dx

(e−

xt

)dx

=

∫ ∞0

ezx1

te−

xt dx =

1

t

∫ ∞0

e−( 1t−z)xdx =

1

t

11t − z

=1

1− zt.

(55)

HW: For Sk(t) :=∑k

j=1 X jt , compute MSk (t)(z).


Example

For a standard Normal random variable Z ∼ N(0, 1), we have

MZ (t) = E[etZ ] =

∫ ∞−∞

etz fZ (z)dx =

∫ ∞−∞

etz1√2π

e−12z2

dz

=

∫ ∞−∞

e12t2 1√

2πe−

12

(z−t)2dz = e

12t2.

(56)

Therefore, for Y = µ+ σZ ∼ N(µ, σ2), it follows that

MY (t) = E[et(µ+σZ)] = etµE[e(tσZ)] = etµe12

(tσ)2= etµ+ 1

2t2σ2

. (57)


Example

Consider an i.i.d sequence Xini=1 such that all the Xi ∼ X , where for aparameter θ > 0,

fX (x) =1

θe−

xθ (58)

Define Sn := X1+...+Xnn , and so

E[Sn] = θ

Var[Sn] =θ2

n.

(59)


Example

It follows that

MSn(t) = E[etSn ] =

[E[e

tnX]]n

=[MX

( t

n

)]n=

[1

1− tθn

]n

∴ limn→∞

MSn(t) = lim

n→∞

[1

1− tθn

]n= etθ

∴ PSn⇒ PS := N(θ, 0).

(60)


Example

Furthermore, define

Zn :=Sn − E[Sn]√

Var[Sn]=√

n · Sn − θθ

. (61)

It follows that

MZn(t) = E[etZn ] = E[et√n· Sn−θ

θ

]= e−t

√nE[et√n· Sn

θ

]= e−t

√nMSn

(t√

n

θ

)= e−t

√n

[1

1− t√n

]n→ e

12t2

∴ PZn ⇒ PZ := N(0, 1).

(62)


HW

Please work through

3.21

3.23

3.24


Heavy Tails

Consider a pair of random variables (X1,X2) with E[X1] = E[X2], andrecall the notation

Si (x) := P[Xi > x ] = 1− Fi (x). (63)

It can be shown via L’Hopital’s rule that

limx→∞

S1(x)

S2(x)= lim

x→∞

f1(x)

f2(x). (64)

If limx→∞S1(x)S2(x) =∞, then we say that X1 has a heavier tail than X2,

based on limiting tail behavior.


Heavy Tails

Individually, for a random variable X if

d

du(eX (u)) > 0 (65)

we say that X has a heavy tail based on mean excess loss function.

Often, the standard for comparison is the exponential distribution todetermine a heavy tail: X is heavier tailed than any exponentialdistribution if for all λ > 0,

limx→∞

eλxP[X > x ] =∞. (66)

HW: Please work through

3.25

3.29

3.30


Parametric Distributions

A Parametric Distribution is determined by a set of parameters. Asubset of set of parametric distributions, known as Scale Distributions, isdetermined by scale parameters. In this case, the r.v. X that is linked tothe parametric distribution FX (x) can be transformed by scalarmultiplication into Y = cX , and the resulting distribution FY (y) = FX

( yc

)is in the same class of distributions.

For example, if X ∼ exp(λ) then FX = 1− e−λX . Then,

FY (y) = 1− e−λcy and so Y ∼ exp(λc ).


Discrete Mixture Distributions

A random variable X is k-point mixture of (X1, ..,Xk) if

FX (x) = a1FX1(x) + ...+ akFXk(x)

1 =k∑

n=1

an.(67)

For example, let Θ be another random variable, with

ai = P[Θ = i ]

1 = P[X = Xi | Θ = i ]

FXi(x) = P[Xi ≤ x ] = P[X ≤ x | Θ = i ].

(68)

It follows that

P[X ≤ x ] =k∑

i=1

P[X ≤ x | Θ = i ]P[Θ = i ] =k∑

i=1

aiFXi(x). (69)



A random variable X is variable-component mixture of (X1, ..,XN) if Ncan be any positive integer, with

FX (x) = a1FX1(x) + ...+ aNFXN(x)

1 =N∑

n=1

an.(70)

For example, let Θ, Θ be random variables, with

p = P[Θ = 0] = 1− P[Θ = 1]

pi = P[Θ = i ].(71)

It follows that

P[X ≤ x ] = pFX0(x) + (1− p)FX1(x)

P[X ≤ x ] =N∑i=1

aiFXi(x).

(72)



A Data-Dependent Distribution is at least as complex as the datathat produced it. As the number of data points increases, so does thenumber of parameters.

An Empirical Model is associate with a discrete model that assignsequal probability 1

n to each of the n data points.

HW: 4.1, 4.2, 4.5, 4.7, 4.10.


Raising To A Power

Let X be a random variable, and Y = X γ its transformation. Then wedefine

Y =

Transformed : γ > 0Inverse : γ = −1Inverse Transformed : γ < 0, γ 6= −1.


Gamma and Transformed Gamma Distribution

Recall our definitions

Γ(α) =

∫ ∞0

tα−1e−tdt

Γ(α; x) =

∫ x0 tα−1e−tdt

Γ(α).

(73)

If X ∼ Gamma(α), the distribution we use is Γ(x ;α) to transform X intoY = X γ .


Transformed Exponential Distribution

If SX (x) = e−x , then assuming γ > 0

Y =

Transformed : SY (y) = e−y

γ

Inverse Transformed : SY (y) = e−y−γ

Weibull : SY (y) = e−( θy

)γ


Mixing

Define X | Λ as a conditional random variable, andSX |Λ(x | λ) = 1− FX |Λ(x | λ) as its conditional survival (probability)function.

Theorem

Define fΛ(λ) as the density of our parameter Λ, and fX (x) as theunconditional density for our random variable X . Then

fX (x) =

∫R

fX |Λ(x | λ)fΛ(λ)dλ

FX (x) =

∫R

FX |Λ(x | λ)fΛ(λ)dλ

µ′k = E[X k ] = E[E[X k | Λ]

]µ2 = Var[X ] = E [Var[X | Λ]] + Var [E[X | Λ]] .

(74)


Mixing

Theorem

If X | Λ ∼ exp(Λ) and Λ ∼ Gamma, then X ∼ Pareto.

Proof.

By our previous theorem,

fX (x) =

∫R

fX |Λ(x | λ)fΛ(λ)dλ

=

∫ ∞λ=0

λe−λxλα−1e−θλθα

Γ(α)dλ

=θα

Γ(α)

∫ ∞λ=0

λαe−λ(x+θ)dλ =θα

Γ(α)

Γ(α + 1)

(x + θ)α+1.

(75)


Mixing

Theorem

If X | Λ ∼ N (Θ, σ) and Θ ∼ N(µ, α), then X ∼ N(µ, σ + α).

Proof.

By our previous theorem, and by definining σ = σασ+α and µ = σµ+αx

σ+α ,

fX (x) =

∫R

fX |Θ(x | θ)fΘ(θ)dθ =

∫ ∞θ=−∞

1√2πσ

e−(x−θ)2

2σ1√2πα

e−(θ−µ)2

2α dθ

=1√

2π(σ + α)e− (x−µ)2

2(σ+α)

∫ ∞θ=−∞

1√2πσ

e−(θ−µ)2

2σ dθ

=1√

2π(σ + α)e− (x−µ)2

2(σ+α) .

(76)


Frailty

A Fraility random variable Λ > 0 is one where the hazard rate is, for aknown function a(x),

µX |Λ = Λ · a(x). (77)

It follows that for A(x) :=∫ x

0 a(s)ds,

SX |Λ(x | λ) = e−λ·A(x)

SX (x) = E[e−Λ·A(x)

]= MΛ[−A(x)]

(78)


Frailty

For example, consider an exponential mixture (i.e. A(x) = x) and that ourparameter is uniformly distributed: Λ ∼ U[0, L].Then

SX (x) =

∫ ∞0

SX |Λ(x | λ) · fΛ(λ)dλ

=

∫ L

0e−λx · 1

Ldλ

=1

Lx· (1− e−Lx).

(79)

HW: 5.7.


Splicing

A k-component spliced distribution has density

fX (x) =k∑

n=1

ak fk(x)1(cn−1,cn) (80)

HW: Can you splice together an exponential and a Pareto distribution?Give an example if you can!

HW:

5.1,5.3,5.6,5.9,5.12

5.13,5.14,5.20

Limiting Distributions: 5.10

5.21,5.22,5.26


Linear Exponential Family

A random variable X has a linear exponential density if

f (x ; θ) =p(x)er(θ)x

q(θ). (81)

For example, if X ∼ N(θ, σ), then

f (x ; θ) =1√

2πσ2e−

12σ2 (x−θ)2

=

[1√

2πσ2e−

x2

2σ2

] [eθσ2 x]

eθ2

2σ2

.

(82)


Linear Exponential Family

∴ ln [f (x ; θ)] = ln [p(x)] + r(θ)x − ln q(θ)

⇒ ∂f

∂θ=[r ′(θ)x − q′(θ)

q(θ)

]f (x ; θ)

⇒ 0 =∂

∂θ[1] =

∂

∂θ

[ ∫ x=∞

x=−∞f (x ; θ)dx

]=

∫ x=∞

x=−∞

∂f

∂θdx =

∫ x=∞

x=−∞

[r ′(θ)x − q′(θ)

q(θ)

]f (x ; θ)dx

= r ′(θ)E[X ]− q′(θ)

q(θ).

⇒ E[X ] =q′(θ)

r ′(θ)q(θ)=: µ(θ)

⇒ Var[X ] =µ′(θ)

r ′(θ)(Proof: Left as an exercise!).

(83)


Counting Distributions

Let N be a discrete random variable with

P[N = k] := pk ≥ 0. (84)

It follows that

P(z) = PN(z) = E[zN ] :=∞∑k=0

pkzk

P(m)(z) =dm

dzmE[zN ] = E

[ dm

dzm(zN)

]= E

[N(N − 1) · .. · (N −m + 1)zN−m

]=∞∑

k=m

k(k − 1) · ... · (k −m + 1)zk−mpk .

∴ P(m)(0) = m!pm.

(85)


Poisson Distributions

For example, consider the Poisson distribution with pk = e−λλk

k! for allk ≥ 0. We can compute

P(z) = PN(z) = E[zN ] :=∞∑k=0

e−λλk

k!zk

=∞∑k=0

e−λ(λz)k

k!

= e−λ∞∑k=0

(λz)k

k!= e−λeλz

= eλ(z−1).

(86)


Poisson Distributions

Theorem

If Njnj=1 is a sequence of i.i.d. Poisson random variables with respective

parameters λjnj=1, then

N := N1 + ...+ Nn ∼ Poisson(λ1 + ...+ λn). (87)

Proof.

PN(z) = Πnj=1PNj

(z) = Πnj=1eλj (z−1) = e(

∑nj=1 λj)(z−1). (88)


Poisson Distributions : Parameter Uncertainty

Theorem

Assume Λ has a density fΛ(λ), and P[N = k | Λ = λ] = e−λλk

k! .Then

pk = P[N = k] =

∫ ∞0

e−λλk

k!fΛ(λ)dλ =

1

k!E[Λke−Λ]. (89)

For example, if fΛ(λ) = λα−1e−λθ

θαΓ(α) , then

pk =1

k!

∫ ∞0

e−λλkλα−1e−

λθ

θαΓ(α)dλ =

1

k!

1

θαΓ(α)

∫ ∞0

e−λ(1+ 1θ

)λk+α−1dλ

=Γ(k + α)

k!Γ(α)

θk

(1 + θ)k+α.

(90)

HW: Compute pk if Λ ∼ Exp(λ).


Poisson Distributions : Parameter Uncertainty

Theorem

Assume Λ has a density fΛ(λ), and P[N = k | Λ = λ] = e−λλk

k! .Then

pk = P[N = k] =

∫ ∞0

e−λλk

k!fΛ(λ)dλ =

1

k!E[Λke−Λ]. (89)

For example, if fΛ(λ) = λα−1e−λθ

θαΓ(α) , then

pk =1

k!

∫ ∞0

e−λλkλα−1e−

λθ

θαΓ(α)dλ =

1

k!

1

θαΓ(α)

∫ ∞0

e−λ(1+ 1θ

)λk+α−1dλ

=Γ(k + α)

k!Γ(α)

θk

(1 + θ)k+α.

(90)HW: Compute pk if Λ ∼ Exp(λ).


Binomial Distributions Revisited

Define X and N such that for a sequence Xjnj=1 with each of theXj ∼ X and

p = P[X = 1] = 1− P[X = 0]

N = N1 + ...+ Nn.(91)

It follows that

PX (z) = (1− p)z0 + pz1 = 1− p + pz

⇒ PN(z) = [1− p + pz ]n

⇒ P[N = k] =m!

(m − k)!k!pk(1− p)m−k .

(92)

We can also show that

E[N] = np

Var[N] = np(1− p).(93)


(a, b, 0) class

A discrete random variable N is a member of the (a, b, 0) class if , forpk = P[N = k] and all k ∈ 1, 2, 3...,

pk

pk−1= a +

b

k. (94)

It follows that

kpk = akpk−1 + bpk−1 = a(k − 1)pk−1 + (a + b)pk−1

⇒∞∑k=1

kpk = a∞∑k=1

(k − 1)pk−1 + (a + b)∞∑k=1

pk−1

∴ E[N] = aE[N] + (a + b) · 1 =a + b

1− a.

(95)

HW: Ex. 6.1, 6.2.


(a, b, 0) class: Fitting to data

By definition, if we take succesive probabilities (frequencies) of observeddata, and we know that the data is drawn from an (a, b, 0)-classdistribution, then it must be that for yk := k pk

pk−1,

yk = ak + b. (96)

For example, if we observe f total data points, and fk are the total numberof observed points with property k , then

pk =fkf⇒ yk = k

fkfk−1

. (97)

To fit, plot yk against k, and fit for a and b.HW: Read on the (a, b, 1) class, read pages 90− 94, and complete Ex.6.4, 6.6.


(a, b, 0) class: Fitting to data

According to results by Sundt and Jewell, and extended in Hess,Liewald, and Schmidt, the three distributions in this class are

Distribution a b p0 pk

Poisson 0 λ e−λ e−λ λk

k!

Negative Binomial 1− p (1− p)(r − 1) pr Γ(r+k)k!Γ(r) pr (1− p)k

Binomial − p1−p (n + 1) p

1−p (1− p)n(nk

)pk(1− p)n−k

We can use this table to match empirically observed data with anappropriate distribution.


http://www.casact.org/library/astin/vol12no1/27.pdf



(a, b, 0) class: Finan - Example 28.1

Let N be a member of C (a, b, 0) satisfying the recursive probabilities

kpk

pk−1=

3

4k + 3. (98)

Identify the distribution N.

Answer:As a > 0, b > 0, it follows that N is a negative binomial distribution.Furthermore,

3

4= a = 1− p

3 = b = (1− p)(r − 1) =3

4(r − 1)

⇒ (p, r) = (0.25, 5).

(99)



The distribution of accidents for 84 randomly selected policies is as follows:

Number of Accidents Number of Policies

0 32

1 26

2 12

3 7

4 4

5 2

6 1

Identify the frequency model that best represents these data



The distribution of accidents for 84 randomly selected policies, and theiradjusted frequency ratios, are as follows:

Number of Accidents Number of Policies k pkpk−1

1 26 0.8125

2 12 0.9231

3 7 1.75

4 4 2.2857

5 2 2.5

6 1 3

Best fit line is 0.462969k + 0.25816, hence a negative binomialdistribution. Check out Wolfram Alpha’s computation by entering ”linearregression (1, 0.8125), (2, 0.9231), (3, 1.75), (4, 2.2857), (5, 2.5), (6, 3)” inthe search bar.


http://www.wolframalpha.com

(a, b, 0) class: Interpretation

Consider now the distribution of accidents for randomly selected policies isas follows:

Number of Accidents Number of Policies

0 400

1 200

2 50

3 8

4 1

Identify the frequency model that best represents these data


(a, b, 0) class: Interpretation

The distribution of accidents for 84 randomly selected policies, and theiradjusted frequency ratios, are as follows:

Number of Accidents Number of Policies k pkpk−1

1 200 1 · 200400

2 50 2 · 50200

3 8 3 · 850

4 1 4 · 18

Best fit line is −0.002k + 0.5. Is this a Binomial or Poisson distribution?Check out Wolfram Alpha’s computation by entering ”linear regression(1, 1 · 200

400 ), (2, 2 · 50200 ), (3, 3 · 8

50 ), (4, 4 · 18 )” in the search bar.


http://www.wolframalpha.com

Multiple Exposures to Risk

A bank, insurance company, hedge fund, or pension sponsor among otherentities is charged with navitgating the risks borne by their endeavor. Inorder to mitigate and price such risks, one must first set out to identifytheir various forms.


Multiple Exposures to Risk

A few known variants include

Market Risk.

Interest Rate Risk.

FX Risk.

Liquidity Risk.

Credit Risk.


Market Risk

Exposure to capital markets.

Can affect both assets and liabilities.

Examples include equities and other financial instruments.


Interest Rate Risk

Exposure to changes in short rate.

Underestimating volatility of short rate evolution (shocks to economy.)

Exposure to changes in yield curve and term structure.


FX Risk

Inability to predict or affect foreign government domestic policy(prime rate) and foreign policy (devaluation.)

Can lead to inability to properly hedge financial instruments held inforeign countries.

Foreign interest rate fluctuation can lead to possible shortfalls thatbanks may wish to immunize their portfolio against. Currencyfluctuation also leads to daily operational risk, and should factor intoplanning.

For example, should a band tour Europe or America this summer ?Click here for an insightful article by Neil Shah in the Wall StreetJournal TM, with comment from the manager of a very prominentrock band


http://online.wsj.com/article/SB10001424052970204630904577056123331660042.html

Liquidity Risk

Exposure to depressed prices in return for expedited sale (liquidation)of assets.

Also affects strategies that hedge long term contracts with short termassets.

Banks are also prone to this in the form of bank runs.


Credit Risk

This also falls under Default Risk, where one is exposed to(counterparty) default on contracts that exchange cash flows.

Correlated with Market Risk, although different opinions on how so.

For life insurance companies, a life (x) can be seen as a default risk ifthat person outlives the present value of their retirement annuity.Also known as Longevity (Mortality) Risk.

This happens, for example, when pension sponsors overestimate themortality of the insured population. Reinsurance can be bought, butsuch reinsurers are also susceptible to the previously defined risks,depending on their asset and liability portfolio.As an example, please refer to the paper by Tsai, Tzeng, and Wang onHedging Longevity Risk When Interest Rates Are Uncertain


http://www.tandfonline.com/doi/abs/10.1080/10920277.2011.10597617

http://www.tandfonline.com/doi/abs/10.1080/10920277.2011.10597617

Credit Risk

Complicated to measure and perhaps even define:

Risk = P[Default]?Risk = E[Loss | Default]?or...Risk = E[Loss] = E[Loss | Default]× P[Default]?However, does E[Loss | Default] relate in some fashion to P[Default] ?In an adverse economy mired in recession, perhaps so. In fact,Frye inhis paper ”Collateral Damage” (2000) argues that the same factorsthat increase default rates can also decrease the value of loan collateral.Click here for a review of Structural Models for Credit Riskpublishedby the SOA.

Clearly, regulators, counterparties, clearinghouses, and shareholders wouldlike a precise definition of the risk involved with an entity’s business. Onepossible definition is the capital a business entity should hold. So what todo?


http://www.bis.org/bcbs/events/oslo/frye.pdf



http://www.soa.org/library/newsletters/risk-management-newsletter/2009/june/jrm-2009-iss16-wang.aspx


Credit Risk


Risk = P[Default]?Risk = E[Loss | Default]?

or...Risk = E[Loss] = E[Loss | Default]× P[Default]?However, does E[Loss | Default] relate in some fashion to P[Default] ?In an adverse economy mired in recession, perhaps so. In fact,Frye inhis paper ”Collateral Damage” (2000) argues that the same factorsthat increase default rates can also decrease the value of loan collateral.Click here for a review of Structural Models for Credit Riskpublishedby the SOA.








Credit Risk


Risk = P[Default]?Risk = E[Loss | Default]?or...Risk = E[Loss] = E[Loss | Default]× P[Default]?

However, does E[Loss | Default] relate in some fashion to P[Default] ?In an adverse economy mired in recession, perhaps so. In fact,Frye inhis paper ”Collateral Damage” (2000) argues that the same factorsthat increase default rates can also decrease the value of loan collateral.Click here for a review of Structural Models for Credit Riskpublishedby the SOA.








Credit Risk


Risk = P[Default]?Risk = E[Loss | Default]?or...Risk = E[Loss] = E[Loss | Default]× P[Default]?However, does E[Loss | Default] relate in some fashion to P[Default] ?

In an adverse economy mired in recession, perhaps so. In fact,Frye inhis paper ”Collateral Damage” (2000) argues that the same factorsthat increase default rates can also decrease the value of loan collateral.Click here for a review of Structural Models for Credit Riskpublishedby the SOA.








Credit Risk


Risk = P[Default]?Risk = E[Loss | Default]?or...Risk = E[Loss] = E[Loss | Default]× P[Default]?However, does E[Loss | Default] relate in some fashion to P[Default] ?In an adverse economy mired in recession, perhaps so. In fact,Frye inhis paper ”Collateral Damage” (2000) argues that the same factorsthat increase default rates can also decrease the value of loan collateral.Click here for a review of Structural Models for Credit Riskpublishedby the SOA.








Stochastic Actuarial Reserving (Ko and Shiu)

Fix a probability space(

Ω,F ,P)

and a standard Brownian motion W

that lives on this space.

Consider now a term contact with term T and let α denote themanagement charges factor along with β representing the policyholder’sparticipation factor.

Furthermore, assume mean and standard deviation parameters (µ, σ)respectively and the corresponding Geometric Brownian Mutual Fund Asset

St = S0eµt+σWt . (100)



Using this as the model of the asset returns upon which premiums areinvested, the policyholder wishes to purchase a contract that pays amaturity benefit credited at a rate of return which is the greater of

the customer’s risk discount rate r , where r < µ or

the participation rate of the stock index returns of S .

Symbolically, for a current premium P invested in the , the contractpayout value at maturity is

V (T ) = (1− α)P max

erT , 1 + β

(ST

S0− 1)

. (101)



Assume that the policyholder is able to fully participate in the returns fromthe fund (i.e. β = 1.)Then

V (T ) = (1− α)PST

S0+ (1− α)P max

erT −

(ST

S0

), 0

:= V1(T ) + V2(T ).

(102)

Here, V1(T ) is the net premium, or payoff, for investing in the index fundand V2(T ) is the guaranteed option payoff if the index fundunder-performs relative to the risk discount rate r .

How does one reserve to meet the obligations of V2(T ).



One can see that the probability of a payout, that V2(T ) 6= 0 is for large T

P[V2(T ) 6= 0] = P[rT > µT + σWT ] = Φ( r − µ

σ

√T)≈ 0. (103)

Since it is a low probability event that we have to prepare for a payoutV2(T ) and since we can directly replicate the payoff V1(T ) by initially

purchasing (1−α)PS0

units of the index fund, an actuary may be tempted tonot reserve for the uncertain portion of the guarantee, V2(T ), if thecontract has a relatively long term T .

Is this a wise decision?


Return Measures

A bank, insurance company, hedge fund, or pension sponsor can have theirefficiency measured by how they return profits over a time horizon.For example, if the value of a portfolio at time t is denoted by X (t), thenreturns are measured at times t0, t1, t2, ..., tn and the Loss in period(tk−1, tk) is

L[X ](tk) := − (X (tk)− X (tk−1)) (104)

and the Loss Distribution is

r [X ](tk) := − lnX (tk)

X (tk−1)(105)


Return Over Benchmark

Comparing to a benchmark rB(t) for returns, a manager can be rated via

RoB :=1

tn − t0

n∑k=1

(r [X ](tk)− rB(tk)) (106)

as well as the deviation away from the benchmark returns:

Dev :=

√√√√ 1

tn − t0

n∑k=1

(r [X ](tk)− rB(tk))2 (107)

Notice that RoB is a linear measure.


Sharpe Ratio

Recall from (financial) economics that there is a measure of return vs risk,known as the Sharpe Ratio. This measures the reward gained versus therisk ventured to return the gains won.

If α is the realized return on anasset over a risk free rate r , then for a standard deviation σ of the returnson the asset we have the informational measure

Sharpe Ratio :=α− r

σ(108)


Sharpe Ratio

Recall from (financial) economics that there is a measure of return vs risk,known as the Sharpe Ratio. This measures the reward gained versus therisk ventured to return the gains won. If α is the realized return on anasset over a risk free rate r , then for a standard deviation σ of the returnson the asset we have the informational measure

Sharpe Ratio :=α− r

σ(108)


Sharpe Ratio

Going back to our periodic returns, we can enact a similar measure ofperformance via

S :=1

tn−t0

∑nk=1 (r [X ](tk)− rB(tk))√

1tn−t0

∑nk=1 (r [X ](tk)− rB(tk))2

(109)

The numerical values of S can be used to rank manager performance, butas with any models there are criticisms. First off is the usefulness of thedenominator as a risk measure.


Risk Measures

Define a probability space (Ω,F ,P). Define the set A ⊂ L0(Ω,F ,P) ofpossible risks, where

X ,Y ∈ A ⇒ X + Y ∈ A.

X ∈ A ⇒ αX ∈ A for all α ∈ [0, 1].

X ∈ A ⇒ X + a ∈ A for all a ∈ R.


Risk Measures

A risk measure ρ : A → R ∪ ∞ may be required to satisfy some of thefollowing properties:

Law Invariance: ∀X ,Y ∈ A, x ∈ R,P[X ≤ x ] = P[Y ≤ x ]⇒ ρ[X ] = ρ[Y ] .

Monotonicity: ∀X ,Y ∈ A, P[X ≤ Y ] = 1⇒ ρ[X ] ≤ ρ[Y ].

Subadditivity: ∀X ,Y ∈ A, ρ[X + Y ] ≤ ρ[X ] + ρ[Y ].

Positive Homogeneity: ∀X ∈ A, α ≥ 0, ρ[αX ] = αρ[X ].

Translation Invariance: ∀X ∈ A, a ∈ R, ρ[X + a] = ρ[X ] + a.

Convexity: ∀X ,Y ∈ A, λ ∈ [0, 1],ρ[λX + (1− λ)Y ] ≤ λρ[X ] + (1− λ)ρ[Y ]

A risk measure that satisfies Monotonicity, Subadditivity, PositiveHomogeneity, and Translation Invariance is said to be Coherent.


Risk Measures

Note that Translation Invariance ⇒ ρ[X − ρ[X ]

]= 0.

In the insurance field, ρ[X ] is seen as the solvency capital requiredby a regulating agency for a company exposed to loss (risk) X .

The capital is a requirement imposed on the firm so that the surplusof assets over liabilities for the company is at least ρ[X ].

ρ[X ] should be chosen to ensure that X > ρ[X ] is a sufficiently rareevent.

Regulators should, and do, take into consideration the costs incurredby a company that is required to hold excessive solvency capital.Hence, a balance must be sought between solvency and onerouscapital requirements. The choice of ρ should be decided upon withthese thoughts in mind.


Premium Principles vs Risk Measures

Companies or individuals who wish to exchange a future liability foran up-front payment seek out willing partners. They find them in theinsurance industry.

However, insurance firms pricing the risk of being liable for a futurepayment is coupled must also meet the demands of regulators toremain solvent.

One of the most common methods is the Expected Value PremiumPrinciple (EPP) which returns ρ[X ] = E[X ].

However, such a principle doesn’t take into account possible (large)deviations from the average loss. Some corrections:


Premium Principles vs Risk Measures

EPP with loading: ρ[X ] = (1 + α)E[X ] for some α > 0.

Standard Deviation Premium Principle: ρ[X ] = E[X ] + α√

V [X ]for some α ≥ 0.

Variance Premium Principle: ρ[X ] = E[X ] + αV [X ] for someα ≥ 0.

Dutch Premium Principle: ρ[X ] = E[X ] + θE[(X − αE [X ])+] forsome α ≥ 1, θ ∈ [0, 1].


Value at Risk

Let X ∈ L0(Ω,F ,P) and α ∈ [0, 1]. Define

VaRα[X ] := Qα[X ] = inf x | P[X ≤ x ] ≥ α (110)

The VaR is the amount of cash a company requires to set the probabilityof the company becoming insolvent equal to α. The Basel Committee onBanking has in the past set VaR as a measure to set capital requirements.VaR is also used in the Portfolio Premium Pricing Principle in actuarialsettings.


Example: Value at Risk

Consider now the distribution of Losses for accident claims from randomlyselected policies is as follows:

Claim Amount L Number of Policies Probability

0 400 0.607

100 200 0.303

200 50 0.076

500 8 0.012

1000 1 0.002

Compute the Value at Risk for the 90%, 95% and 99% levels.


Example: Value at Risk

Consider now the distribution of Losses for accident claims from randomlyselected policies is as follows:

Claim Amount L P[L ≤ x ]

1000 1

500 0.998

200 0.986

100 0.910

0 0.607

It follows that

(VaR0.90[L],VaR0.95[L],VaR0.99[L]) = (100, 200, 500). (111)


Why Choose VaR as a Risk Measure?

Consider that a firm is allowed to penalize for the cost of capital by apre-determined factor 0 < ε ≤ 1.

Define the (penalized) Expected Shortfall via

ESε(ρ[X ]) := E[(X − ρ[X ])+

]+ ερ[X ] (112)

Question: What capital u = ρ[X ] level solves

minu∈R+

ESε(u)? (113)


VaR as Minimal Capital Requirement (DGK [2003])

Theorem

The smallest capital ρ[X ] that solves

minu∈R+

ESε(u) (114)

is given by ρ[X ] = Q1−ε[X ].

There is a nice geometric proof for this, given by the authors.


VaR as a Minimal Capital Requirement (DGK [2003])

Proof.

(Sketch:) Assume that X has a smooth distribution. The critical point isobtained via

0 =d

du

(E[(X − u)+] + εu

)=

d

du

(∫ ∞0

P[(X − u)+ > x ]dx + εu)

=d

du

(∫ ∞u

P[X > x ]dx + εu)

=d

du

(∫ ∞u

(1− FX (x))dx + εu)

= FX (u)− 1 + ε

→ ρ[X ] = u = F−1X (1− ε) = Q1−ε[X ].

(115)


Value at Risk

Note that VaRα[X − VaRα[X ]

]= 0

This implies that the risk of holding an asset with uncertain outcomeX can be neutralized by holding a reserve amount VaRα[X ].

But VaR doesn’t account for how big the loss for the company in theevent of insolvency.

And regarding diversification..See Example 3.14 in our textbook andexample on page 19 of Hardy Study Note


http://www.casact.org/library/studynotes/hardy4.pdf

Value at Risk and Solvency II

From the European Commission FAQ on Solvency II:

”The aim of a solvency regime is to ensure the financial soundness ofinsurance undertakings, and in particular to ensure that they cansurvive difficult periods. This is to protect policyholders (consumers,businesses) and the stability of the financial system as a whole. ”

” ..insurers must have available resources sufficient to cover both aMinimum Capital Requirement (MCR) and a Solvency CapitalRequirement (SCR). The SCR is based on a Value-at-Risk measurecalibrated to a 99.5% confidence level over a 1-year time horizon.The SCR covers all risks that an insurer faces (e.g. insurance, market,credit and operational risk) and will take full account of any riskmitigation techniques applied by the insurer (e.g. reinsurance andsecuritisation).. ”


http://ec.europa.eu/internal_market/insurance/docs/solvency/solvency2/faq_en.pdf

Some other (related) Risk Measures

The quantile measure VaR doesn’t give us information on the tail behaviorof X beyond Qp[X ]. One way to account for this is to define

TVaRp[X ] :=1

1− p

∫ 1

pQq[X ]dq. (116)

Furthermore, in the tail region, we can also determine how much the lossX exceeds Qp[X ], on average, by

CTEp[X ] := E[X | X > Qp[X ]]. (117)

Finally, regulators are interested in the average shortfall

ESFp[X ] := E[(X − Qp[X ])+]. (118)


Equivalence of Risk Measures [DGK (2006)]

Theorem

TVaRp[X ] = Qp[X ] +1

1− pESFp[X ] (119)


Equivalence of Risk Measures [DGK (2006)]

Proof.

Under the transformation x → q, where x = F−1X (q) = Qq[X ] and

(q, dq) =(

FX (x), fX (x)dx)

ESFp[X ] =

∫ ∞−∞

(x − Qp[X ])+fX (x)dx

=

∫ 1

0(Qq[X ]− Qp[X ])+dq

=

∫ 1

pQq[X ]dq − (1− p)Qp[X ]

⇒ TVaRp[X ] =1

1− p

∫ 1

pQq[X ]dq =

ESFp[X ]

1− p+ Qp[X ].

(120)


Equivalence of Risk Measures: Some Points

Note that if E[X ] <∞ and if FX is continuous, then

limp→0

TVaRp[X ] = E[X ]

TVaRp[X ] = CTEp[X ](121)


Normally Distributed Risk

For an X ∼ N(µ, σ2), it can be shown for 0 < p < 1 that

Qp[X ] = µ+ σΦ−1(p)

CTEp[X ] = E[X | X > Qp[X ]] =1

1− p

∫ ∞Qp

1√2πσ

ye−(y−µ)2

2σ dy

=1

1− p

∫ ∞Qp−µσ

µ+ σz√2πσ

e−z2

2 dz

=µ

1− p

∫ ∞Φ−1(p)

e−z2

2

√2πσ

dz +σ

1− p

∫ ∞Φ−1(p)

ze−z2

2

√2πσ

dz

= µ+σ

1− pφ(Φ−1(p))

E[(X − ρ[X ])+] = σφ(ρ[X ]− µ

σ

)− (ρ[X ]− µ)

[1− Φ

(ρ[X ]− µσ

)]ESFp[X ] = σφ

(Φ−1(p)

)− σ(1− p)Φ−1(p)

(122)Albert Cohen (MSU) STT 459: Actuarial Models MSU Spring 2015 114 / 193

Lognormally Distributed Risk

For a ln X ∼ N(µ, σ2), it can be shown for 0 < p < 1 that

Qp[X ] = eµ+σΦ−1(p)

E[(X − ρ[X ])+] = eµ+σ2

2 Φ

(σ +

µ− ln ρ[X ]

σ

)− ρ[X ]Φ

(µ− ln ρ[X ]

σ

)

ESFp[X ] = eµ+σ2

2 Φ

(σ − Φ−1(p)

)− eµ+σΦ−1(p)(1− p)

CTEp[X ] = eµ+σ2

2

Φ

(σ − Φ−1(p)

)1− p

(123)


Pareto Distributed Risk

If SX (x) =(

θθ+x

)α, then

Qp[X ] = θ[(1− p)−1α − 1]

CTEp[X ] = Qp[X ] +Qp[X ] + θ

α− 1.

(124)

HW:

Compute ESFp[X ].

3.31, 3.32, 3.35, 3.36.

Show that CTEp[X ] is, in general, not subadditive.

How about ESFp[X ] - is it subadditive ?

How about ρ[X ] := E[X ] - is it subadditive ?


HW: CTEp[X ]

Consider again the empirically observed distribution of Losses for accidentclaims from randomly selected policies is as follows:

Claim Amount L Probability

0 0.607

100 0.303

200 0.076

500 0.012

1000 0.002

Compute the CTEp[X ] for p ∈ 0.90, 0.95, 0.99. Consult the HardyStudy Note for cases where the VaR levels Qp = Qp+ε for some ε > 0.


Thoughts on CTEp[X ]

CTEp[X ] ≥ Qp[X ], making it more preferable in the eyes of aregulator who always desires higher solvency capital.

CTEp[X ] is used for equity-linked life contingent insurance in Canada.

Minimum Capital Test For Federally Regulated Property and CasualtyInsurance Companies report from the Government of Canada:”..(u)nder the MCT, regulatory capital requirements for various risksare set directly at a pre-determined target confidence level. OSFI haselected 99% of the expected shortfall (conditional tail expectation orCTE 99%) over a one-year time horizon as a target confidence level”,with the footnote ”As an alternative, a value at risk (VaR) at99.5% confidence level or expert judgement was used when it was notpractical to use the CTE approach.”

CTEp is not the only non-subadditive measure, apparently..

What to do? Is there a ”better” property than sub-additivity?


http://www.osfi-bsif.gc.ca/eng/fi-if/rg-ro/gdn-ort/gl-ld/pages/mct2015.aspx

Distortion Risk Measures

Recalling actuarial terminology, for the loss distributionF : R+ → [0, 1], define the Survival Function

S(x) = 1− F (x). (125)

Define the Distortion Function g : [0, 1]→ [0, 1] withg(0) = 0, g(1) = 1, and g increasing.Define the Distortion Risk Measure that inflates the probability oflarge losses via

ρg [X ] =

∫ ∞0

g(S(x))dx

= Eg [X ].

(126)

Theorem

DVTGKV (2004) If g is also concave, then for any losses X ,Y ∈ A,

P[(X ,Y ) ∈ R2+] = 1⇒ ρg [X + Y ] ≤ ρg [X ] + ρg [Y ]. (127)


http://www.math.yorku.ca/~efurman/MATH4280_2013/RiskMeasuresDhaene.pdf

Distortion Risk Measures

HW Are CTEp[X ],ESFp[X ],TTVaRp[X ] and Qp[X ] Distortion RiskMeasures ? Proof or counterexamples..

Hint: Tryg(x) = 1x≥1−p

g(x) = min

x

1− p, 1

(128)

and use the relationships obtained for the above risk measures.What about probability masses?


Proportional Hazard Transform

For α ≥ 1, define

g(s) = s1α (129)

If X ∼ Pareto(γ, θ), then E[X ] = θγ−1 and (assuming γ > α,)

S(x) = P[X > x ] =( θ

θ + x

)γg(S(x)) =

( θ

θ + x

) γα

⇒ ρg [X ] =

∫ ∞0

( θ

θ + x

) γα

dx

=θ

γα − 1

(130)


Proportional Hazard Transform

For α ≥ 1, define

g(s) = s1α (131)

If X ∼ exp(λ), then E[X ] = 1λ and (assuming γ > α,)

S(x) = P[X > x ] = e−λx

g(S(x)) = e−λαx

⇒ ρg [X ] =

∫ ∞0

e−λαxdx

=α

λ

(132)

HW: How about if X ∼ N(µ.σ2)?


Dual Power Transform

Define g(s) = 1− (1− s)N . Note that if N ∈ N, then for N i.i.d. copies ofX : i.e. X1,X2, ..,XN it follows that

g(S(x)) = 1− (1− S(x))N = P[max X1,X2, ..,XN > x ]

∴ ρg [X ] =

∫ ∞0

g(S(x))dx = E[max X1,X2, ..,XN].(133)

HW: Compute ρg [X ] for X ∼ exp(λ). Use (λ,N) = (0.1, 10) andwhatever numerical tools you feel comfortable with.


The Wang Transform

Wang, in A Universal Framework for Pricing Financial and Insurance(2002) presents the following idea: Define λ ∈ R and a distortion functiong such that

g(S(x)) = Φ(

Φ−1(S(x)) + λ). (134)

Then there is a universal pricing method based on this transform forfinancial assets or liabilities over a time horizon [0,T ]. In this distortionmeasure, λ is the market price of risk, reflecting the level of systematicrisk.

HW: Consider an X ∼ N(µ, σ2). How does the Wang Transform acton X ? Specifically, compute ρg [X ]!

HW: Consider an X ∼ LN(µ, σ2). How does the Wang Transformact on X ? Specifically, compute ρg [X ]. How does ρg [X ] relate to thequantile measure Qp[X ] in this case?


http://www.casact.org/library/ASTIN/wang.pdf

http://www.casact.org/library/ASTIN/wang.pdf

Spectral Risk Measure

Assume a non-negative, non-increasing, right-continuous, integrablefunction γ : [0, 1]→ R+ such that∫ 1

0γ(p)dp = 1 (135)

and the corresponding risk measure

ργ [X ] :=

∫ 1

0γ(p)Qp[X ]dp. (136)

Then ργ [X ] is a Spectral Risk Measure.

HW: Consider X ∼ exp(λ), and γ(p) = 2p. Compute ργ [X ].

Is Expected Shortfall a Spectral Risk Measure? How about Value atRisk? How about just Expected Value?

Are Distortion and Spectral Risk Measures related? Check outTheorem 3.1 here for a nice connection.


http://www.unav.edu/documents/29147/425795/wp1506.pdf

Regulators, Economic Capital, and Expected Shortfall

In the case of choosing an adequate capital requirement, a regulator ischarged with making the event X > ρ[X ] ”unlikely.” A selection processcan entail first choosing the a shortfall measure M, where M satisfies

ρ1[X ] ≤ ρ2[X ]⇒ M

[(X − ρ1[X ])+

]≥ M

[(X − ρ2[X ])+

]. (137)

This reflects the view of the view of the regulator that an increase ofsolvency capital implies a reduction of the shortfall risk that M measures.

A choice of a monotonic M is enough to guarantee the above condition.



Note that this condition relates the view of the regulator thatincreasing regulatory capital decreases the likelihood of insolvency.

As mentioned earlier, this view must compete with the cost of holdingexcess capital.

Given the choice of regulatory measure M, the optimal solutionu = ρ[X ] solves

minu∈R+

(E[M[(X − u)+]

]+ εu

). (138)



It was also shown earlier that for M[x ] = x , the optimal solvencycapital is ρ[X ] = Q1−ε[X ].

It follows from our Equivalence Of Risk Measures Theorem that theminimized expected penalized shortfall using this capital is also themeasure

minu∈R+

(E[(X − u)+

]+ εu

)= εTVaR1−ε[X ]. (139)

From the view of a company that must self-determine the optimalsolvency capital, the problem mentioned above is also known asdetermining the economic capital required.

Please refer to Economic Capital Allocation Derived From RiskMeasures [DGK (2003)] for more on economic capital from anactuarial perspective.

Great overall review paper is Risk Measures and Comonotonicity: AReview [DGK (2006)]


https://www.soa.org/Library/Journals/NAAJ/2003/april/naaj0304_3.aspx

https://www.soa.org/Library/Journals/NAAJ/2003/april/naaj0304_3.aspx

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=892185

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=892185

Representation Theorem for Coherent Measures

The original authors Artzner, P., Delbaen, F., Eber, J.-M., Heath, D.(1999) Coherent measures of risk, Mathematical Finance 9(3), 203-228.state and prove the foundational theorem:

Theorem

For a finite set of outcomes, ρ is a coherent risk measure if and only if ∃ afamily P of probability measures on the set of outcomes such that

ρ[X ] = supEQ[X ] : Q ∈ P

. (140)

To see how this relates to the margin system SPAN (Standard PortfolioAnalysis of Risk) developed by the Chicago Mercantile Exchange, pleaselook through Section 3.1 here and the references within.


https://people.math.ethz.ch/~delbaen/ftp/preprints/CoherentMF.pdf

Introduction

An estimator is a quantity, rule, or formula that estimates properties of aset of values. For example, if our set is A = x1, x2, ..., xn, then acommon estimator is

x :=

∑nj=1 xj

n. (141)

A single estimate that results in a value for a an unknown parameter θ iscalled a point estimator θ. Such an estimator is called unbiased if

bias(θ) := E[θ − θ] = 0. (142)


Example 46.1 (Finan)

A population consists of the values 1, 3, 5, and 9. We want to estimate themean of the population µ. A random sample of two values from thispopulation is taken without replacement, and the mean of the sample µ isused as an estimator of the population mean µ.

Find the probability distribution of µ

Is µ an unbiased estimator?


Example 46.1 (Finan)

We find that there are six possible outcomes for this experiment,consisting of 6 pairs of values drawn:

S = 1, 3 , 1, 5 , 1, 9 , 3, 5 , 3, 9 , 5, 9 . (143)

Each of these pairs is equally likely, and so we assign probability 16 to each

of the possible values of µ:

µ ∈ 2, 3, 4, 5, 6, 7 . (144)

and so it follows that

µ =1 + 3 + 5 + 9

4= 4.5

E[µ] =1

6(2 + 3 + 4 + 5 + 6 + 7) = 4.5

∴ bias(θ) = E[µ− µ] = 0.

(145)


Asymptotically Unbiased and Consistent Estimators

If, as the sample size increases, the bias vanishes we say that the estimatorθn is asymptotically unbiased if forall θ :

limn→∞

E[θn − θ] = 0. (146)

If, for all δ > 0,

limn→∞

P[| θn − θ |> δ] = 0 (147)

then θn is a consistent estimator.


Theorem

Theorem

If limn→∞ E[θn − θ] = 0 = limn→∞Var[θn], then θn is consistent.

Proof.

Fix a δ > 0. By Markov’s Inequality,

0 ≤ P[|θn − θ| > δ] ≤ E[(θn − θ)2]

δ2=

E[(

(θn − E[θn]) + (E[θn]− θ))2]

δ2

=Var[θn]− 2E[θn − θ] · E[θn − E[θn]] + E[(E[θn]− θ)2]

δ2→ 0.

(148)


HW: Asymptotically Unbiased and Consistent Estimators

Assume that X1, ...,Xn are i.i.d., and for all i ∈ 1, 2, ..n, Xi ∼ exp(λ).

Are there any values λ > 0 such that

λn = max X1, ..,Xnλn = min X1, ..,Xn

(149)

as estimators of E[X ] = 1λ are Asymptotically Unbiased? Consistent?

If not, can we multiply them by scaling functions of n to get thesedesired properties?


Mean-Squared Error (MSE)

The mean-squared error associated with an estimator θ is

MSE(θ) = E[(θ − θ)2] = Var[θ] + bias2(θ). (150)

If two unbiased estimators θ1, θ2 are such that MSE (θ1) < MSE (θ2),then we say that θ1 is more efficient than θ2.

If θ1 is more efficient that any other unbiased estimator θ, then wesay that θ1 is a uniformly minimum variance unbiased estimator.

HW:

Find MSE(λ) for the previous HW problem.

HW: Practice problems 46.1− 46.4, 46.7 in Finan.


http://faculty.atu.edu/mfinan/actuarieshall/CGUIDE.pdf

Example: Estimator for Uniform Random Variable

Assume that X1, ...,Xn are i.i.d., and for all i ∈ 1, 2, ..n, Xi ∼ U(0, L).Define Ln as an estimator of L such that

Ln = max X1, ..,Xn . (151)

Is Ln Asymptotically Unbiased? Consistent? Asympotically zero -MSE?



By definition, if we set FLn(y) := P[Ln ≤ y ], then

FLn(y) =

(y

L

)nfLn(y) =

nyn−1

Ln.

(152)



It follows that

Var[Ln] + bias2(Ln) = MSE(Ln)

= E[(Ln − L)2] =

∫ L

0(y − L)2 nyn−1

Lndy

=n

Ln

∫ L

0(y 2 − 2Ly + L2)yn−1dy

=3L2

(n + 1)(n + 2)→ 0.

(153)

So not only is Ln asympotically zero -MSE, but also Var[Ln]→ 0 andbias(Ln)→ 0 and so Ln is consistent by our previous theorem.


Method of Moments and Matching Percentile

For a random variable X with parameters θ1, ..., θp, defineθ = (θ1, ..., θp) and

F (x | θ) = E[1X≤x | θ]

µ′k(θ) = E[X k | θ]

µ′k =1

n

n∑i=1

(xi )k

(154)

for a sequence of realizations X1 = x1 , ..., Xn = xn.



A method-of-moments estimate of the vector θ is any solution to thesystem of p equations

µ′k(θ) = µ′k . (155)

The smoothed empirical estimate of the 100g th percentile is obtained byarranging the observed values x1, ..., xn from smallest to largest:

x1, ..., xn →

x(1), ..., x(n)

(156)

and finding the integer p such that pn+1 ≤ g ≤ p+1

n+1 .



From this, we define

ˆVaRg [X ] := [p + 1− (n + 1)g ]x(p) + [(n + 1)g − p]x(p+1). (157)

and the method of matching percentiles corresponds to matching

F ( ˆVaRgk [X ] | θ) = gk ∀k ∈ 1, ..., p (158)

and a set of arbitrarily chosen percentiles g1, ..., gp.

See Examples 58.5 and 58.8 in Finan for an application of matchingpercentiles.

See Examples 58.7 and 58.10 in Finan for an application of method ofmoments.

HW: Practice problems 58.6− 58.10 in Finan.





SOA Exam C Practice Question #1

You are given:

Losses X follow a Losses follow a loglogistic distribution withcumulative distribution function characterized by parameters (θ, γ):

FX (x) =

(xθ

)γ1 +

(xθ

)γ , ∀x ≥ 1 (159)

A random sample of losses is

Sample Losses = 10, 35, 80, 86, 90, 120, 158, 180, 200, 210, 1500 .(160)

Calculate the estimate of θ by percentile matching, using the 40th and80th empirically smoothed percentile estimates.



The 40th percentile is the .4(12) = 4.8th smallest observation. Byinterpolation, it is .2(86) + .8(90) = 89.2. The 80th percentile is the.8(12) = 9.6th smallest observation. By interpolation it is.4(200) + .6(210) = 206. The equations to solve are

0.4 =

(89.2θ

)γ1 +

(89.2θ

)γ0.8 =

(206θ

)γ1 +

(206θ

)γ⇒ (γ, θ) = (2.1407, 107.08).

(161)



You are given:

A sample x1, x2, , x10 is drawn from a distribution with probabilitydensity function (with θ > σ):

fX (x) =1

2

(1

θe−

xθ +

1

σe−

xσ

), ∀x ∈ (0,∞) (162)

Summed values10∑i=1

xi = 150

10∑i=1

x2i = 5000.

(163)

Estimate θ by matching the first two sample moments to thecorresponding population quantities.



By definition of exponential distributions, we obtain the pair of equations:

150

10= E[X ] =

1

2(θ + σ)

5000

10= E[X 2] =

1

2(2θ2 + 2σ2)

⇒ (θ, σ) = (20, 10).

(164)

The solution can be found via

30 = θ + σ ⇒ σ = 30− θ500 = θ2 + σ2 = θ2 + (30− θ)2 = 2θ2 − 60θ + 900

⇒ θ ∈ 10, 20 .(165)

We choose θ = 20 to fix σ = 10.


Two-sided Uniform Interval Estimation

Define the empirically observed quantities

X :=1

n

n∑k=1

xk

X 2 :=1

n

n∑k=1

x2k

(166)

for a sequence of n observed values X1 = x1, ...,Xn = xn . Assume thatthe Xini=1 are i.i.d. and drawn from a population X ∼ U[a, b]. Use

method of moments to determine a, b in terms of X ,X 2.


Two-sided Uniform Interval Estimation

By definition,

E[X ] =a + b

2

E[X 2] =a2 + b2 + ab

3.

(167)

Estimation via matching of the first two moments:

X =a + b

2

X 2 =a2 + b2 + ab

3.

(168)

Algebra leads to

(a, b) =

(X −

√3

√X 2 − X

2,X +

√3

√X 2 − X

2). (169)


Maximum Likelihood Estimation for Complete Data

Assume again a probability space (Ω,F ,P), where a subset Aini=1 of Fis contained within Ω, and

∪ni=1Ai ⊆ Ω. (170)

Here, the Ai represent observations obtained from an experimentconsisting of a sequence of i.i.d. random variables Xini=1, pulled from acommon distribution F (· | θ).



The idea now is to use information obtained by the outcomes observedXi ∈ Ai. In fact, we determined that the likelihood function

L(θ) := Πni=1P[Xi ∈ Ai | θ] (171)

should be maximized, due to our observation that this independentsequence of events has occurred!



For ease of computation, we may wish to maximize the log-likelihoodfunction

l(θ) := ln [L(θ)] =n∑

i=1

ln (P[Xi ∈ Ai | θ]). (172)



You are given:

Losses X follow a Single-parameter Pareto distribution with densityfunction characterized by a parameter α ∈ (0,∞)

fX (x) =α

xα+1, ∀x > 1 (173)

A random sample of size five produced three losses with values 3, 6and 14, and two losses exceeding 25.

Determine the maximum likelihood estimate of α.



By definition, SX (x) = 1xα and so

L(α) = fX (3)fX (6)fX (14)SX (25)2 =α

3α+1

α

6α+1

α

14α+1

1

252α

=1

252

α3

157500α

l(α) = ln [L(α)] = 3 lnα− α ln (157500)− ln (252)

⇒ l ′(α) =3

α− ln (157500)

⇒ α =3

ln (157500)= 0.2507.

(174)



You are given:

Losses X follow an exponential distribution with mean θ

A random sample of 20 losses is distributed as follows:

Loss Range Frequency

[0,1000] 7

(1000,2000] 6

(2000,∞) 7

Determine the maximum likelihood estimate of θ.



Our loss function is, for p := e−1000θ ,

L(θ) = F (1000)7[F (2000)− F (1000)]6[1− F (2000)]7

=(

1− e−1000θ

)7 (e−

1000θ − e−

2000θ

)6 (e−

2000θ

)7

L(p) = (1− p)7(p − p2)6(p2)7 = p20(1− p)13

l(p) = 20 ln p + 13 ln (1− p)

l ′(p) =20

p− 13

1− p= 0

⇒ p =20

33= 1− e−

10001996.90

⇒ θ = 1996.90.

(175)



See Examples 59.1− 59.5 in Finan for an application of maximumlikelohood estimators to loss models.

HW: Practice problems 59.2− 59.7 in Finan.

HW: Read Section 60 in Finan on Maximum Likelihood Estimationfor Incomplete Data.

HW: Read Section 67 in Finan on Estimation of Class (a, b, 0).

HW: Practice problems 67.2.2− 67.4 in Finan.

Note: Take-home final exam may have problems very similar in scopeto those above!








Introduction

Assume now that there is a history of claims X1, ...,Xn, all with acommon risk parameter Θ such that the Xk | Θnk=1 are i.i.d.

Furthermore, based on the history we have observed, we would like to useit to estimate the (common) expected loss

µ(Θ) := E[Xn+1 | Θ]. (176)


Introduction

The question to ask is how Credible the data X1, ...,Xn is. As a riskmanager, there is the option to charge the book or manual premium

µ := E[µ(Θ)] (177)

or charge the weighted average premium for some sequence of weightsβ1, ..., βn:

X :=n∑

k=1

βkXk . (178)


Introduction

As in the section on risk measures, we seek a metric Z for the credibilityof the observed claims X1, ...,Xn so far.

This value Z should depend on everything, and measure the likelihoodthat the observed data reflects reality, and should be used for premiumsover the assumed average µ:

Z : (Θ,X1, ...,Xn)→ [0, 1]. (179)

The balance reached between history and assumed µ results in an optimalpremium at time n + 1:

Pn+1 = (1− Z )µ+ Z X . (180)


Limited Fluctuation Credibility

A risk manager may be tempted to give full credibility, Z = 1, toobserved data. In order to justify this, there must be some stability to theobserved data.

By assigning equal weight to the observed claims, αk := 1n , we seek to

determine whether, by the Central Limit Theorem

p ≤ P[∣∣∣∣ X − µµ

∣∣∣∣ ≤ r

]= P

[∣∣∣∣∣ X − µσ√n

∣∣∣∣∣ ≤ rµ√

n

σ

]≈ 2Φ

(rµ√

n

σ

)−1 (181)

for company-determined values of r and p, and assumed parameters(µ, σ2) = (E[Xj ],Var[Xj ]).


Limited Fluctuation Credibility

Hence, the question of full-credibility now reduces to a large enoughnumber of exposure units:

rµ√

n

σ≥ Φ−1

(1 + p

2

)=: yp. (182)

Rewritten in terms of n, the inequality is now

n ≥ λ0

(σ

µ

)2

λ0 :=(yp

r

)2.

(183)


Limited Fluctuation Credibility: Aggregate Claims

Consider now the joint sequence (Ni ,Xi )ni=1, where Ni is the totalnumber of claims in year i , and Xi is the total loss in year i .

Assume that the Xi and Ni are independent of each other, and that theNi ∼ Poisson(λ) are i.i.d. as well as Xi independent with commondistribution FX .

Let Yij denote the j th claim in year i . Assume that the Yij are i.i.d. withmean θY and variance σ2

Y . It follows that

Xi =

Ni∑j=1

Yij . (184)


Limited Fluctuation Credibility: Aggregate Claims

Based on this set-up, the independence of the claim frequency and claimsize leads to

E[Xi ] = E[Ni ] · E[Yij ] = λθY

Var[Xi ] = Var[Ni ] · E[Yij ]2 + E[Ni ] ·Var[Yij ]

= λ(θ2Y + σ2

Y ).

(185)

It follows that for full credibility, we need

n ≥ λ0

σ2Xi

E[Xi ]2= λ0

θ2Y + σ2

Y

λθ2Y

(186)



You are given:

The number of claims has a Poisson distribution.

Claim sizes Yij have a Pareto distribution with common values(θY , σ

2Y ) = (0.10, 0.015).

The number of claims and claim sizes are independent.

The observed pure premium should be within 2% of the expectedpure premium 90% of the time.

Determine the expected number of claims needed for full credibility.



The expected number of claims is equal to λn. From the informationprovided, r = 0.02 and y0.90 = 1.645, and so for full credibility, we require

λn ≥ λ ·

(λ0

σ2Xi

E[Xi ]2

)

= λ0θ2Y + σ2

Y

θ2Y

=

(1.645

0.02

)2(1 +

0.0152

0.102

)= 16913.

(187)


Greatest Accuracy Credibility

One way of determining Z ∈ (0, 1) is by optimizing over all linear functionsof past data:

minα1,...αn

E

(µ(Θ)−

[µ(1−

n∑k=1

αk) +n∑

k=1

αkXk

])2 . (188)


Greatest Accuracy Credibility

Using Hilbert space theory, Shiu and Sing show the optimal weights to be

αk =

E[(µ−µ(Θ))2]E[(Xk−µ(Θ))2]

1 +∑n

j=1E[(µ−µ(Θ))2]E[(Xj−µ(Θ))2]

=

Var[µ(Θ)]E[Var[Xk |Θ]]

1 +∑n

j=1Var[µ(Θ)]

E[Var[Xj |Θ]]

1− Z = 1−n∑

k=1

αk =1

1 +∑n

j=1E[(µ−µ(Θ))2]E[(Xj−µ(Θ))2]

=1

1 +∑n

j=1Var[µ(Θ)]

E[Var[Xj |Θ]]

(189)


http://jofap.org/documents/vol11/v11_shui.pdf

Buhlmanns k−factor

From the previous slide, we have

Z =

∑nj=1

Var[µ(Θ)]E[Var[Xj |Θ]]

1 +∑n

j=1Var[µ(Θ)]

E[Var[Xj |Θ]]

. (190)

If, however, we have a common value k such that, for all j ∈ 1, ..., n,

1

k:=

Var[µ(Θ)]

E[Var[Xj | Θ]](191)

then

Z =n 1k

1 + n 1k

=n

n + k. (192)

This common value k is known as Buhlmanns k−factor. Notice that ifn→∞, then Z → 1.


Buhlmanns k−factor

Furthermore, we have that for constant β := E[Var[Xk | Θ]],

βk :=

1E[Var[Xk |Θ]]∑nj=1

1E[Var[Xj |Θ]]

=1

n(193)

and so we can write

Pn+1 = (1− Z )µ+ Z

(X1 + ...+ Xn

n

). (194)

In this setting, we can extend to number of claims or other observedvariables V1, ...,Vn and say that our Buhlmann Credible Estimate forVn+1 is

Vn+1 = (1− Z )µV + Z

(V1 + ...+ Vn

n

). (195)


Compound Poisson Process

Assume the existence of a claim count random variable such thatN ∼ Poisson(λ).

Assume that the claim sizes X1,X2, .. to be i.i.d. copies of arandom variable X .

Assume that X , X1,X2, .. are independent of N.

Based on these assumptions, define the aggregate claim Y | N such that

Y | N :=N∑

k=1

Xk . (196)


Compound Poisson Process

By the Tower Property of Conditional Expectations, and the fact that N isPoisson-distributed, we have

E[Y ] = E[E[Y | N]] = E[NE[X ]] = E[N]E[X ].

Var[Y ] = E[Var[Y | N]] + Var [E[Y | N]]

= E[NVar[X ]] + Var [NE[X ]]

= E[N]Var[X ] + Var [N]E[X ]2

= E[N]Var[X ] + E [N]E[X ]2

= E[N]E[X 2]

(197)



You are given:

Claim counts follow a Poisson distribution with mean θ.

Claim sizes follow an exponential distribution with mean 10θ .

Claim counts and claim sizes are independent, given θ.

The prior distribution has probability density function

fΘ(θ) =5

θ6for θ > 1. (198)

Calculate Buhlmanns k−factor for aggregate losses.



To compute k , we need

µ = E[µ(Θ)] =

∫ ∞1

10θ2 · 5

θ6dθ =

50

3

Var[µ(Θ)] = Var[10Θ2] =

∫ ∞1

(10θ2 − 50

3

)2

fΘ(θ)dθ

=11111

50= 222.22

E[v(Θ)] =

∫ ∞1

200θ3 · fΘ(θ)dθ = 500

⇒ k =E[v(Θ)]

Var[µ(Θ)]=

500

222.22= 2.25.

(200)


Bayesian Parameter Estimation

Another way to use previously observed data to estimate future claims andlosses is via Bayesian Parameter Estimation. In this approach, weassume that we wish to estimate X using observed data by firstconditioning the parameter distribution on observed data :

E[Xn+1 | X1, ...,Xn] =

∫θ∈R

E[Xn+1 | θ]fΘ(θ | X1, ...,Xn)dθ. (201)

Question: Is there a connection between Bayesian Parameter Estimationand Credibility Theory?



You are given:

The size X of a claim is uniformly distributed on the interval [0, θ]

The prior distribution has probability density function

fΘ(θ) =500

θ2for θ > 500. (202)

Two claims, x1 = 400 and x2 = 600, are observed. You calculate theposterior distribution as:

fΘ(θ | X1 = 400,X2 = 600) = 3

(6003

θ4

)for θ > 600. (203)

Calculate the Bayesian premium E[X3 | X1 = x1,X2 = x2].



As X ∼ U[0, θ], we have E[X | Θ] = 12 Θ. It follows that

E[X3 | X1 = 400,X2 = 600] =

∫θ∈R

E[X3 | θ]fΘ(θ | X1 = 400,X2 = 600)dθ

=

∫ ∞600

1

2θ · 3

(6003

θ4

)dθ

= 36003

2

∫ ∞600

1

θ3dθ

= 36003

2

1

2 · 6002= 450.

(204)


Introduction

Recall that if we take a sequence of i.i.d. (independent, identicallydistributed) random variables X1,X2, ...,Xn, called a random sample,with common mean and variance:

E[Xi ] = µ

Var[Xi ] = σ2(205)

then

E

[X1+X2+...+Xn

n − µσ√n

]= 0

Var

(X1+X2+...+Xn

n − µσ√n

)= 1

(206)


Introduction

If the random sample comes from a Normal population, then by theprevious slide, we know

X1+X2+...+Xnn − µσ√n

∼ N(0, 1) (207)

This implies we can use our Observed Sample Mean X1+X2+...+Xnn to

estimate the Population Sample Mean µ as long as we know σ a priori:

P

[−zα

2≤


≤ zα2

]= P

[−zα

2≤ Z ≤ zα

2

]= Φ

(zα

2

)− Φ

(−zα

2

)= 1− α

(208)


Introduction

In fact, we can tune the accuracy of our prediction by how small wechoose α > 0 and how large n of a sample we take. Symbolically, if weobserve our sample mean to be x , then we are 100 · (1− α)% that

µ ∈(

x − zα2

σ√n, x + zα

2

σ√n

)

Some values:

100 · (1− α)% α2 zα

2

90% 0.05 1.645

95% 0.025 1.960

99% 0.005 2.576


Introduction

In fact, we can tune the accuracy of our prediction by how small wechoose α > 0 and how large n of a sample we take. Symbolically, if weobserve our sample mean to be x , then we are 100 · (1− α)% that

µ ∈(

x − zα2

σ√n, x + zα

2

σ√n

)Some values:

100 · (1− α)% α2 zα

2

90% 0.05 1.645

95% 0.025 1.960

99% 0.005 2.576


Interval Width

In general, a confidence interval with width n observations and100 · (1− α)% certainty has a total width

w = 2zα2· σ√

n. (209)

Inverting this for n, we derive the number n observations required for agiven width w , confidence level 100 · (1− α)%, and known deviation σ tobe

n =(

2zα2· σ

w

)2. (210)

For example, if α = 0.05, σ = 25,w = 10, then n =(2 · 1.96 · 25

10

)2= 96.

Notice that n ∝ 1w2 .


Unknown σ

In most cases, one can safely assume that we are also estimating theVariance of the population along with the population mean. By the CLT,we know that for any random sample with mean µ and variance σ2,

P

[−zα

2≤


≤ zα2

]→ 1− α. (211)

If we work only from observed data, then we recall that our SampleMean-Variance pair (X ,S2) has the form

S2 =n∑

i=1

(Xi − X

)2

n − 1

X =X1 + X2 + ...Xn

n

(212)


If we work only from observed data, we can use our modified version ofthe CLT:

Theorem

P

[−zα

2≤ X − µ

S√n

≤ zα2

]→ 1− α. (213)

The speed of convergence depends on the underlying populationdistribution, of course. As a general rule of thumb, when using onlyobserved data, one can consider the above limit achieved if n ≥ 40.

As a computational consideration, recall that

S2 =n∑

i=1

(Xi − X

)2

n − 1

=1

n − 1

n∑i=1

X 2i −

1

n · (n − 1)

(n∑

i=1

Xi

)2

(214)


If we work only from observed data, we can use our modified version ofthe CLT:

Theorem

P

[−zα

2≤ X − µ

S√n

≤ zα2

]→ 1− α. (213)

The speed of convergence depends on the underlying populationdistribution, of course. As a general rule of thumb, when using onlyobserved data, one can consider the above limit achieved if n ≥ 40.

As a computational consideration, recall that

S2 =n∑

i=1

(Xi − X

)2

n − 1=

1

n − 1

n∑i=1

X 2i −

1

n · (n − 1)

(n∑

i=1

Xi

)2

(214)


Bionomial Distribution - Polling ?

Imagine now that we are polling a group of people whose answer is eitheryes or no. Assign the value 1 for an answer of yes, 0 for the answer no.Define Xi to be the numerical value of the i th respondent’s answer, and

p :=

∑ni=1 Xi

n(215)

the sample proportion of positive respondents. The question remains : howgood of an estimation of the total population positive proportion p is p ?.



Luckily, we have seen this Binomial random variable before, and we knowthat given a population proportion p of a yes answer, we have

E[p] := E

[1

n

n∑i=1

Xi

]= p

Var(p) := Var

(1

n

n∑i=1

Xi

)=

p · (1− p)

n

= σ2p

= E[S2] ............... Really ?

(216)

It follows that for large n , we can make the approximation

p − p

σp∼ N(0, 1). (217)




E[p] := E

[1

n

n∑i=1

Xi

]= p

Var(p) := Var

(1

n

n∑i=1

Xi

)=

p · (1− p)

n

= σ2p = E[S2]

............... Really ?

(216)


p − p

σp∼ N(0, 1). (217)




E[p] := E

[1

n

n∑i=1

Xi

]= p

Var(p) := Var

(1

n

n∑i=1

Xi

)=

p · (1− p)

n

= σ2p = E[S2] ............... Really ?

(216)


p − p

σp∼ N(0, 1). (217)



Formally, the modified CLT tells us that

P

−zα2≤ p − p√

p·(1−p)√n

≤ zα2

→ 1− α. (218)



This can be inverted to say that for large n,

P [p− ≤ p ≤ p+] ≈ 1− α

p− =p +

z2α2n − zα

2

√pqn +

z2α2

4n2

1 +z2α2n

p+ =p +

z2α2n + zα

2

√pqn +

z2α2

4n2

1 +z2α2n

(219)


One sided Confidence Bounds

Since

P

[X1+X2+...+Xn

n − µS√n

≤ zα

]→ 1− α

P

[X1+X2+...+Xn

n − µS√n

≥ −zα

]→ 1− α

(220)

we say that

µ < x + zαs√n

is a Large Sample Upper Confidence Bound for µ

µ > x − zαs√n

is a Large Sample Lower Confidence Bound for µ.


T-distributions

If the population of interest is normal, then X1,X2, ...,Xn constitutes anormal random sample with both µ, σ unknown.

Since we need σ to estimate µ, we do the next best thing and substitute Sfor σ, as before.

Tn−1 :=

(1n

∑ni=1 Xi

)− µ

S√n

fTn−1(x) =1√

(n − 1)π

Γ(n2 )

Γ(n−12 )

1(1 + x2

n−1

) n2

→ 1√2π

e−x2

2 =: fZ (x)

E[Tn−1] = 0

Var(Tn−1) =n − 1

n − 3

(221)


T-distributions

If the population of interest is normal, then X1,X2, ...,Xn constitutes anormal random sample with both µ, σ unknown.Since we need σ to estimate µ, we do the next best thing and substitute Sfor σ, as before.

Tn−1 :=

(1n

∑ni=1 Xi

)− µ

S√n

fTn−1(x) =1√

(n − 1)π

Γ(n2 )

Γ(n−12 )

1(1 + x2

n−1

) n2

→ 1√2π

e−x2

2 =: fZ (x)

E[Tn−1] = 0

Var(Tn−1) =n − 1

n − 3

(221)


T-distributions

Define tα,n−1 to satisfy

P[Tn−1 ≥ tα,n−1] = α (222)

It follows that

P[−tα2,n−1 ≤ Tn−1 ≤ tα

2,n−1] = 1− α (223)

and the corresponding two-sided and one-sided confidence intervals can bederived.


Prediction Intervals

Given a normal random sample X1,X2, ...,Xn, we want to predict the next,independent value Xn+1. One very tempting predictor is the historical, orsample, average X := 1

n

∑ni=1 of the previous n values. The residual error

attached with this prediction is X − Xn+1, and we can compute

E[X − Xn+1] = µ− µ = 0

Var(X − Xn+1

)= Var

(X)

+ Var(Xn+1)

=σ2

n+ σ2 = σ2 ·

(1 +

1

n

) (224)


Prediction Intervals

With the above information, it follows that

X − Xn+1√(σ2 ·

(1 + 1

n

)) ∼ N(0, 1)

X − Xn+1√(S2 ·

(1 + 1

n

)) ∼ Tn−1

(225)

and we say that if the sample mean is x , and sample deviation is s, thenthe prediction interval for a single observation from a normal populationis

P

[∣∣∣Xn+1 − x∣∣∣ ≤ tα

2,n−1 · s ·

√1 +

1

n

]= 1− α (226)


Confidence Intervals for Variance

Given a normal random sample X1,X2, ...,Xn, we see that

n − 1

σ2S2 =

∑ni=1(xi − X )2

σ2(227)

has a chi-squared (χ2n−1) distribution with n − 1 degrees of freedom.

Because S2 is always positive, and because the chi-squared distribution isasymmetric, we compute the confidence interval

P[χ2

1−α2,n−1 <

n − 1

σ2S2 < χ2

α2,n−1

]= 1− α

where P[χ2n−1 > χ2

α,n−1]]

= α

(228)


stt 459: actuarial models - michigan state...

Documents