regression tree credibility model · 2017-01-04 · 2 buhlmann-straub credibility model in this...

33
Regression Tree Credibility Model Liqun Diao * , Chengguo Weng Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, N2L 3G1, Canada Version: Sep 16, 2016 Abstract Credibility theory is viewed as cornerstone in actuarial science. This paper brings machine learning tech- niques into the area and proposes a novel credibility model and a credibility premium formula based on regression trees, which is called regression tree credibility (RTC) premium. The proposed RTC method first recursively binary partitions a collective of individual risks into exclusive sub-collectives using a new credibility regression tree algorithm based on credibility loss, and then applies the classical B¨ uhlmann- Straub credibility formula to predict individual net premiums within each sub-collective. The proposed method effectively predicts individual net premiums by incorporating covariate information, and it is par- ticularly appealing to capture various non-linear covariates effects and/or interaction effects because no specific regression form needs to be pre-specified in the method. Our proposed RTC method automatically selects influential covariate variables for premium prediction with no additional ex ante variable selection procedure required. The superiority in prediction accuracy of the proposed RTC model is demonstrated by extensive simulation studies. Keywords: Credibility Theory; Regression Trees; Premium Rating; Predictive Analytics. * Assistant Professor. Email: [email protected] Corresponding Author. Associate Professor. Email: [email protected]

Upload: others

Post on 26-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

Regression Tree Credibility Model

Liqun Diao∗, Chengguo Weng†

Department of Statistics and Actuarial Science,

University of Waterloo, Waterloo, N2L 3G1, Canada

Version: Sep 16, 2016

Abstract

Credibility theory is viewed as cornerstone in actuarial science. This paper brings machine learning tech-

niques into the area and proposes a novel credibility model and a credibility premium formula based on

regression trees, which is called regression tree credibility (RTC) premium. The proposed RTC method

first recursively binary partitions a collective of individual risks into exclusive sub-collectives using a new

credibility regression tree algorithm based on credibility loss, and then applies the classical Buhlmann-

Straub credibility formula to predict individual net premiums within each sub-collective. The proposed

method effectively predicts individual net premiums by incorporating covariate information, and it is par-

ticularly appealing to capture various non-linear covariates effects and/or interaction effects because no

specific regression form needs to be pre-specified in the method. Our proposed RTC method automatically

selects influential covariate variables for premium prediction with no additional ex ante variable selection

procedure required. The superiority in prediction accuracy of the proposed RTC model is demonstrated

by extensive simulation studies.

Keywords: Credibility Theory; Regression Trees; Premium Rating; Predictive Analytics.

∗Assistant Professor. Email: [email protected]†Corresponding Author. Associate Professor. Email: [email protected]

Page 2: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

1 Introduction

Risk classification, as a key ingredient of insurance pricing, exploits observable characteristics to classify

insureds with similar expected claims and build a tariffication system to enable price discrimination on

insurance products. An a priori classification scheme is a premium rating system that is developed to

express correctly measurable a priori information about a new policyholder or an insured without any

claims experience. An a priori classification scheme is often unable to identify all the important factors

for premium rating because some are either unmeasurable or unobservable. A posteriori classification (or

experience rating) system is therefore necessary to re-rate risks by integrating claims experience into the

rating system, and to obtain a more fare and reasonable price discrimination scheme.

Credibility theory, viewed as cornerstone in the field of actuarial science (Hickman and Heacox, 1999),

has become the paradigm for insurance experience rating and widely used by actuaries. Its advent dates

back to Whitney (1918). The idea behind the theory is to view the net premium of an individual risk

as a function µ(Θ) of a random element Θ representing the unobservable characteristics of the individual

risk, and calculate the (credibility) premium as a linear combination of individual claims experience and

collective net premium. The credibility theory has been developed into two streams: limited fluctuation

credibility theory and greatest accuracy credibility theory. The former is a stability-oriented theory and its

main objective is to incorporate into the premium as much individual claims experience as possible and keep

the premium sufficiently stable. The modern applications of credibility theory mainly rely on the latter,

the greatest accuracy theory, which approximates the net premium by minimizing mean square prediction

error and thus deriving the best linear unbiased estimator based on both individual and collective claims

experience. The theoretical foundation of credibility theory was first established by Buhlmann (1967, 1969),

bringing about the famous Buhlmann model, and subsequently extended in many directions. The model

by Buhlmann and Straub (1970) allows individual risks to be associated with distinct volume measures

(e.g., risk exposure). Hachemeister (1975) generalized the model to the regression context, and Jewell

(1973) extended it to multi-dimensional risks. Since insurance data often show a natural hierarchical

structure, the hierarchical credibility model was developed to embody this feature (Jewell, 1975; Taylor,

1979; Sundt, 1980; Norberg, 1986; Buhlmann and Jewell, 1987). The credibility theory encompassed within

the framework of generalized linear model can be found in Nelder and Verrall (1997), Ohlsson (2008) and

Garrido and Zhou (2009).

In insurance practice, relevant covariates are usually available to partially, if not fully, reflect the risk

profile of individual risks (e.g., age, gender, annual income, driving history, etc.). In some of the existing

literature, covariate information has been incorporated into various credibility models (e.g., Hachemeister,

1975; Nelder and Verrall, 1997; Ohlsson, 2008; Garrido and Zhou, 2009), but a specific regression form

2

Page 3: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

for either individual net premium or claim amount on covariates needs to be pre-specified (e.g., linear

regression, generalized linear regression, ect.) before the credibility premium is calculated. As such, the

available credibility models lack the flexibility of incorporating nonlinear effects and/or interaction effects,

which are highly likely to exist among insurance data, and thus render the risk of model misspecification.

We propose novel regression tree credibility (RTC) model in this paper, which, for the first time,

brings machine learning techniques into the framework of credibility theory and provides encouraging

performance in terms of premium prediction accuracy compared to classical credibility models. Regression

trees (Breiman et al. 1984) are popular machine learning algorithms developed to construct prediction

models by recursively binary partitioning data space into smaller regions in which a simple model is

sufficient. Many interesting extensions and alterations on regression trees were proposed and a thorough

review can be found in Loh (2004). Regression trees are powerful non-parametric alternatives to parametric

regression methods and particularly appealing if interest lies in risk classification and prediction. Our

proposed RTC model provides us with a flexible way to utilize covariate information for premium prediction.

It takes two steps. In the first step, a novel regression trees algorithm is proposed to utilize covariate

information to partition a collective of risks into exclusive sub-collectives such that the risk profiles tend

to be homogeneous within each sub-collective and heterogeneous across sub-collectives. The resulting

regression trees are called credibility regression trees because they are built to minimize credibility loss in

premium prediction. In the second step, the classical credibility premium formula is then applied within

each sub-collective. Because the eventual credibility premium formula for individual risks results from

the implementation of a regression trees algorithm, we call our credibility model regression tree credibility

model.

Our proposed RTC model posses many advantages over the existing credibility models in the litera-

ture. First, for the first time, covariate variables are exploited to classify the unobservable risk profiles

for a credibility model in a prioritized way, whereas they, if considered, are typically used to character-

ize the conditional net premium of an individual risk, for example, in the regression credibility model of

Hachemeister (1975). The well-known hierarchical credibility models (Buhlmann and Jewell, 1987) also uti-

lizes covariate information to classify individual risks, but it is done in an ad hoc manner. The classification

in hierarchical credibility models is created according to some inherent hierarchical structure of insurance

data, for example, certain geographic or demographic structure. In contrast, the RTC model classifies

risks in a data-driven manner. Second, our proposed credibility model provides an effective framework to

capture nonlinear and interaction effects of covariates on individual net premium benefited from the use of

regression trees. In other words, no explicit regression form needs to be assumed for either the individual

net premium or risk distribution in general, whereas all the existing credibility models using covariates

require a specific regression form (e.g., Hachemeister, 1975; Nelder and Verrall, 1997). Last but not least,

3

Page 4: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

our proposed RTC model automatically selects significant covariate variables for premium prediction in the

course of building the credibility regression tree, and no ex ante variable selection procedure is required.

The rest of the paper proceeds as follows. Section 2 provides a brief review on Bulhmann-Straub

credibility model. Section 4 specifies our credibility model, Section 3 explains the benefit of splitting,

and Section 5 describes the procedure of establishing credibility regression tree and calculating premiums.

Sections 6 and 7 contain simulation results for balanced and unbalanced claims data respectively. Section

8 concludes the paper. Finally, some technical proof and additional graphics are provided in appendices.

2 Buhlmann-Straub Credibility Model

In this section, a briefly review of the Buhlmann-Straub credibility model will be presented to facilitate

the development of our regression tree credibility model in the subsequent sections.

Model 1 (Buhlmann-Straub Credibility Model) Consider a portfolio of I risks, where each individual

risk i has ni years of claim experiences, i = 1, 2, . . . , I. Let Yi,j denote the claim ratio of individual risk i in

year j, and mi,j be the associated volume measure, also known as weight variable. The profile of individual

risk i is characterized by a scalar θi, which is the realization of a random element Θi (usually either a

random variable or a random vector). Collect all the claims experience of individual risk i into a vector

Yi, i.e., Yi = (Yi,1, . . . , Yi,ni)T , and assume that the following conditions are satisfied:

H11. Conditionally given Θi = θi, {Yi,j : j = 1, 2, . . . , ni} are independent with

E[Yi,j |Θi = θi] = µ(θi) and Var[Yi,j |Θi = θi] =σ2(θi)

mi,j

for some unknown but deterministic functions µ(·) and σ2(·);

H12. The pairs (Θ1,Y1), . . . , (ΘI ,YI) are independent, and {Θ1, . . . ,ΘI} are independent and identically

distributed.

Define structural parameters σ2 = E[σ2(Θi)] and τ2 = Var[µ(Θi)] for risks within the collective I :=

{1, 2, . . . , I}, and denote

mi =

ni∑j=1

mi,j , and m =

I∑i=1

mi.

Let µ = E[Yi,j ] denote the collective net premium, and define

Yi =

ni∑j=1

mi,j

miYi,j , and Y =

I∑i=1

mi

mYi,

4

Page 5: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

which are the sample average of claims associated with individual risk i and that of the collective I.

In the literature, two estimators were proposed for individual premium µ(Θi) in the above Buhlmann-

Straub credibility model: inhomogeneous credibility premium and homogeneous credibility premium. The

inhomogeneous credibility premium is the best premium predictor P(I)i among the classPi = a0 +

I∑i=1

ni∑j=1

ai,jYi,j , a0 ∈ R, ai,j ∈ R

to minimize the quadratic loss E

[(µ(Θi)− Pi)2

]or equivalently E

[(Yi,ni+1 − Pi)2

], and its formula is

given by

P(I)i = αiYi + (1− αi)µ,

where

αi =mi

mi + σ2/τ2(2.1)

is called credibility factor for individual risk i (Buhlmann and Gisler, 2005). The corresponding minimum

quadratic losses for E[(µ(Θi)− Pi)2

]and E

[(Yi,ni+1 − Pi)2

]are respectively given by

L(I)1,i =

σ2

mi + σ2/τ2(2.2)

and

L(I)2,i = σ2 +

σ2

mi + σ2/τ2. (2.3)

In contrast, the homogeneous credibility premium for an individual risk i from the collective I is defined

as the best premium predictor P(H)i among the classQi : Qi =

I∑i=1

ni∑j=1

ai,jYi,j , E[Qi] = E[µ(Θi)], ai,j ∈ R

to minimize the quadratic loss E

[(µ(Θi)−Qi)2

]or equivalently E

[(Yi,ni+1 −Qi)2

]and its formula is

given by

P(H)i = αiYi + (1− αi)Y , (2.4)

where the credibility factor αi is the same as the one in the inhomogeneous premium and given in (2.1).

The corresponding minimum quadratic losses for E[(µ(Θi)−Qi)2

]and E

[(Yi,ni+1 −Qi)2

]are respectively

given by

L(H)1,i = τ2(1− αi)

(1 +

1− αiα•

), (2.5)

5

Page 6: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

and

L(H)2,i = σ2 + τ2(1− αi)

(1 +

1− αiα•

), (2.6)

where α• =∑Ii αi.

In the implementation of the above credibility premium formulas, the structural parameters σ2 and τ2

are respectively estimated by

σ2 =1

I

I∑i

1

ni − 1

ni∑j=1

mi,j

(Yi,j − Yi

)2and τ2 = max

(τ2, 0), (2.7)

where

τ2 = c ·

[I

I − 1

I∑i

mi

m(Yi − Y )2 − Iσ2

m

], (2.8)

and

c =I − 1

I

[I∑i

mi

m

(1− mi

m

)]−1.

σ2g and τ2g are unbiased and consistent estimators for σ2 and τ2 respectively, but τ2 can possibly be negative

(p.62, Buhlmann and Gisler, 2005); thus, τ2 is used as the estimator for τ2.

3 Benefit of Partitioning

This section investigates the benefit of partitioning a collective of risks into sub-collectives for premium

prediction under the Buhlmann model, which is a special case of the Buhlmann-Straub model reviewed in

Section 2 with ni = n and mi,j = 1 for and i = 1, . . . , I and j = 1, . . . , n.

Under the Buhlmann model, it is easy to check that the credibility factor for individual risk αi is the

same for all individual risks, i.e.,

α := αi =n

n+ σ2/τ2, for i = 1, . . . , I ;

the four quadratic loss functions in (2.2), (2.3), (2.5) and (2.6) respectively become

L(I)1,i =

σ2

n+ σ2/τ2= τ2(1− α),

L(I)2,i = σ2 +

σ2

n+ σ2/τ2= σ2 + τ2(1− α),

L(H)1,i = τ2(1− α) +

1

I

τ2(1− α)2

α,

L(H)2,i = σ2 + τ2(1− α) +

1

I

τ2(1− α)2

α.

6

Page 7: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

As the value of I goes large, the second item in the expression of L(H)1,i and the third term in L

(H)2,i get

ignorable and L(H)1,i and L

(H)2,i reduce to L

(I)1,i and L

(I)2,i respectively. Therefore, only L

(I)1,i and L

(I)2,i will be

studied regarding the benefit of partitioning in the rest of the section.

Without loss of generality, it will be proved that the overall credibility quadratic loss for any collective

I = {1, 2, . . . , I} is not worsen off if it is split into two mutually exclusive sub-collectives I1 and I2, where

I1 ∪ I2 = I and I1 ∩ I2 = ∅, subject to an arbitrary partitioning criterion. When working with the loss

function L(I)1,i , the total credibility loss for the collective I is the sum of individual losses over the collective

L =

I∑i=1

L(I)1,i =

I

n/σ2 + 1/τ2,

where σ2 and τ2 are the structural parameters for the collective I. We next consider the summation of

the credibility loss of the two sub-collectives I1 and I2. Let p be the probability that a randomly selected

individual risk from the collective I falls into the sub-collective I1 so that q = 1− p is the probability that

it falls into the sub-collective I2. On average, there will be pI individual risks lying in sub-collective I1 and

qI individual risks in sub-collective I2. Let µ(k), σ2(k) and τ2(k) be structural parameters for sub-collective

Ik, k = 1, 2, defined in the same manner as µ, σ2 and τ2 which have been introduced in Section 2, but

based on sub-collective information. The expected total credibility loss functions for sub-collectives I1 and

I2 are, respectively, given by

L1 =Ip

n/σ2(1) + 1/τ2(1)

and L2 =Iq

n/σ2(2) + 1/τ2(2)

.

Further, it is easy to check σ2 = pσ2(1) + qσ2

(2),

τ2 = pτ2(1) + qτ2(2) + pq(µ(1) − µ(2))2 = τ2 + pq(µ(1) − µ(2))

2,(3.9)

where τ2 = pτ2(1) + qτ2(2).

Proposition 3.1 The credibility losses L, L1 and L2 defined in this subsection satisfy

L1 + L2 ≤ L. (3.10)

Proof. See Appendix A. �

When working with the individual credibility loss L(I)2,i , the total credibility loss for the collective I is given

by

L′ = Iσ2 +I

n/σ2 + 1/τ2= Iσ2 + L

7

Page 8: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

The expected total credibility loss functions for sub-collectives I1 and I2 are respectively of the form

L′1 = Ipσ2(1) +

Ip

n/σ2(1) + 1/τ2(1)

= Ipσ2(1) + L1 and L′2 = Ipσ2

(2) +Iq

n/σ2(2) + 1/τ2(2)

= Ipσ2(2) + L2.

It is easily shown that L′1 + L′2 ≤ L′ by applying Proposition 3.1 and the fact σ2 = pσ2(1) + qσ2

(2).

The discussion above suggests that any partitioning on a collective of individual risks does not deterio-

rate the prediction accuracy of the credibility premium formulas, given that the corresponding structural

parameters for each resulting sub-collective can be computed without any statistical errors. This result

provides a theoretical foundation for the use of any partitioning method in premium prediction, and it

also encourages the use of prediction models which classify a given collective of risks into a great number

of mutually exclusive sub-collectives. As in many other statistical models, however, statistical estimation

error is inevitable, the structural parameters are unknown in general and estimation of these parameters

using available data will bring in statistical errors. The existence of estimation error restrains one from

excessively partitioning the overall collective and instead, one needs to keep a balance between the benefits

from partitioning and the adverse effects from estimation errors.

4 Model Setup

Model 2 (Unbalanced Claims Model) Consider a portfolio of I risks numbered with 1, . . . , I. Let

Yi = (Yi,1, . . . , Yi,ni)T be the vector of claim ratios, mi = (mi,1, . . . ,mi,ni

) be the corresponding weight

vector, and Xi = (Xi,1, . . . , Xi,p)T be the covariate vector associated with individual risk i, i = 1, . . . , I.

The risk profile of each individual risk i is characterized by a scalar θi, which is a realization of a random

element Θi. The following two conditions are further assumed:

H21. The triplets (Θ1,Y1,X1), . . . , (ΘI ,YI ,XI) are independent;

H22. Conditionally given Θi = θi and Xi = xi, the entries Yi,j, j = 1, . . . , n, are independent, and

E[Yi,j |Xi = xi,Θi = θi] = µ(xi, θi) and Var[Yi,j |Xi = xi,Θi = θi] =σ2(xi, θi)

mi,j

for some unknown but deterministic functions µ(·, ·) and σ2(·, ·).

The word ”unbalanced” in the title of the above model is used to signify the fact that the above model

permits distinct lengths of claims experiences and distinct volume measures across individual risks in the

collective. By setting ni = n and mi,j = 1 for i = 1, . . . , I and j = 1, . . . , ni, the above unbalanced claims

model reduces to the following balanced claims model.

Model 3 (Balanced Claims Model) Consider a portfolio of I risks numbered with 1, . . . , I. Let Yi =

(Yi,1, . . . , Yi,n)T be the vector of claims and Xi = (Xi,1, . . . , Xi,p)T be the covariate vector associated with

8

Page 9: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

individual risk i, i = 1, . . . , I. The risk profile of individual risk i is characterized by a scalar θi, which is

a realization of a random element Θi. The following two conditions are further assumed:

H31. The triplets (Θ1,Y1,X1), . . . , (ΘI ,YI ,XI) are independent;

H32. Conditionally given Θi = θi and Xi = xi, the entries Yi,j, j = 1, . . . , n, are independent, and

E[Yi,j |Xi = xi,Θi = θi] = µ(xi, θi) and Var[Yi,j |Xi = xi,Θi = θi] = σ2(xi, θi) (4.11)

for some unknown but deterministic functions µ(·, ·) and σ2(·, ·).

The above credibility models are more general compared to the ones in the literature, because the

existing ones unexceptionally assume some specific functional forms for µ(Xi,Θi) and σ2(Xi,Θi). For

example, the well-known Hachemeister model assumes µ(Xi,Θi) = XTi β(Θi), a linear regression form with

random coefficient β(Θi) depending on the underlying risk profile Θi of individual risk i. In contrast, our

proposed model leaves the form of individual net premium µ(Xi,Θi) unspecified.

Since partitioning of data space improves premium prediction accuracy as discussed in Section 3, it

is natural to consider partitioning guided by available covariate information. Inspired by the ideas in

nonparametric statistics, we approximate µ(Xi,Θi) and σ2(Xi,Θi) by piecewise constant forms as follows

respectively

K∑k=1

I{Xi ∈ Ak}µ(k)(Θi), and

K∑k=1

I{Xi ∈ Ak}σ2(k)(Θi)

mi,j,

where I{·} is an indicator function, {A1, A2, . . . , AK} is a partition of covariate space, K is the number

of partitions and µ(k)(Θi) and σ2(k)(Θi) respectively represent the net premium and the variance of an

individual risk i from the kth sub-collective with a risk profile Θi, i.e.,

µ(k)(θi) = E(Yi,j |Xi ∈ Ak, Θi = θi) and σ2(k)(θi) = Var(Yi,j |Xi ∈ Ak, Θi = θi). (4.12)

The condition of Xi ∈ Ak means that the individual risk i is classified into the kth sub-collective based

on its covariate information. Following the way of deriving inhomogeneous credibility premiums for the

Bulhmann-Straub credibility model reviewed in Section 2, the inhomogeneous credibility premium to pre-

dict µ(k)(Θi) using information from the kth sub-collective is of the form

π(I)(k)i = α

(k)i Yi +

(1− α(k)

i

)µ(k), (4.13)

where µ(k) = E[Yi,j |Xi ∈ Ak] = E[µ(k)(Θi)] is the sub-collective net premium corresponding to the kth

partition,

α(k)i =

mi

mi + σ2(k)/τ

2(k)

,

9

Page 10: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

is the credibility factor of the kth sub-collective and τ2(k) and σ2(k) are the structural parameters for the kth

sub-collective, i.e.,

τ2(k) = Var[µ(k)(Θi)] and σ2(k) = E[σ2

(k)(Θi)].

We develop the following prediction rule based on covariate-dependent partitioning and inhomogeneous

sub-collective credibility premiums

π(I)i =

K∑k=1

I{Xi ∈ Ak}π(I)(k)i . (4.14)

The minimum quadratic losses E[(µ(k)(Θi)− π(I)(k)

i

)2]and E

[(Yi,ni+1 − π(I)(k)

i

)2]are respectively

L(I)1,(k) =

I∑i=1

I{Xi ∈ Ak}σ2(k)

mi + σ2(k)/τ

2(k)

(4.15)

and

L(I)2,(k) =

I∑i=1

I{Xi ∈ Ak}

(σ2(k) +

σ2(g)

mi + σ2(k)/τ

2(k)

). (4.16)

Similarly, we can develop homogeneous credibility prediction rule

π(H)i =

K∑k=1

I{Xi ∈ Ak}π(H)(k)i , (4.17)

where

π(H)(k)i = α

(k)i Yi +

(1− α(k)

i

)Y (k), (4.18)

and Y (k) denotes the average claims experience of the risks in the kth sub-collective. The two homogeneous

credibility losses of the kth sub-collective are

L(H)1,(k) =

I∑i=1

I{Xi ∈ Ak}τ2(k)(1− α(k)i )

(1 +

1− α(k)i

α(k)•

), (4.19)

and

L(H)2,(k) =

I∑i=1

I{Xi ∈ Ak}

[σ2(k) + τ2(k)(1− α

(k)i )

(1 +

1− α(k)i

α(k)•

)], (4.20)

where α(k)• =

∑Ii=1 I{Xi ∈ Ak}α(k)

i .

10

Page 11: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

Figure 1: An example of covariate-dependent partitioningFigure 1 illustrates one example of a

covariate-dependent partitioning, in which

covariate Xi,1 and a threshold c are se-

lected to partition the collective into two sub-

collectives G11 and G12. Consequently, the

profile of individual risks in sub-collective G11

is characterized by the conditional distribu-

tion of (Θi|Xi,1 > c), whereas that in sub-

collective G12 is governed by the conditional

distribution of (Θi|Xi,1 ≤ c). If the risk profile variable Θi is independent of the covariate Xi,1, the two

sub-collectives are homogeneous in the sense that the risk profile variables from the two sub-collectives

follow the same distribution; in this case, as one can see from the proof of Proposition 3.1, there is no

benefit of partitioning at least for a balanced claims model because the total credibility loss remains the

same, i.e., the inequality in (3.10) becomes an equality, while such a partition will increase the estimation

error resulted from less data being used for estimation of structural parameters after a partitioning. Thus,

such covariate is not a significant variable for the data space partitioning and a good partitioning scheme

should avoid partitioning sample space using such covariate variable.

In Figure 1, the sub-collectives G11 and G12 are considered for further partitioning and G11 is selected

and partitioned into two further sub-collectives G21 and G22. While the partitioning procedure may go

further for a finer partition, Figure 1 demonstrates the case where the sample space is partitioned into

three sub-collective G12, G21 and G22, respectively, defined by {Xi,1 ≤ c}, {Xi,1 > c & Xi,2 > d}, and

{Xi,1 > c & Xi,2 ≤ d}. In an example of auto insurance, X1 may represent gender (male if X1 = 1 and

female if X1 = 0) and X2 may represent age. If we choose c = 0 and d = 21, then G21 is the group of

male drivers aged more than 21, G22 is of male drivers with an age under 21 and G12 is a group of female

drivers. The insurer then provides discriminated price to each insured group due to their distinct risk

profile. The choices of variables and thresholds used for partitioning are critical for gaining the benefits of

partitioning, because if they are not properly selected, the benefit from partitioning can be very little while

the estimation of structural parameters with smaller samples in sub-collectives may lead to large statistical

estimation errors.

Generally speaking, an effective partitioning algorithm is anticipated to partition the data space in a

premeditated manner such that the risks are relatively homogeneous within each resulted sub-collective

and inhomogeneous between the sub-collectives in certain sense. We propose a novel credibility regression

tree algorithm, which chooses the covariates, cutting points and order of partitioning in a data-driven

manner and form a partition on the data space which minimizes the overall credibility loss. A credibility

11

Page 12: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

regression tree is an alteration of the Classification and Regression Trees (CART; Breiman et al.,1984).

A brief review of CART will be given in Section 5.1 and our proposed credibility regression tree will be

described in Section 5.2.

5 Regression Tree Credibility Premium

5.1 General Procedure in Regression Trees Methods

Recursive partitioning methods are machine-learning techniques to quantify relationship between two sets

of variables, the responses and covariates (e.g., Zhang and Singer, 2010). Recursive partitioning methods

recursively partition data space into small regions so that a simple regression model is sufficient in each

region. They are powerful alternatives to parametric regression methods and possess many inherent ad-

vantages. First, recursive partitioning methods do not require to pre-specify any mathematical form for

the regression relationship between the responses and covariates, and they therefore enjoy more flexibility

in capturing nonlinear effects and/or interaction relationship of the covariates on responses. Second, no

a prior variable selection procedure is necessary before the establishment of a regression model. Variable

selection is automatically achieved in the course of model building. Third, the decision structure can be

utilized for the purpose of prediction and easily visualized by a hierarchical graphical model. Recursive

partitioning methods have become popular and widely used tools in many scientific fields, but surprisingly,

they have been rarely applied in actuarial science so far.

The Classification and Regression Trees (CART), with details developed in Breiman et al. (1984), is

perhaps the most well-known recursive partitioning algorithm. Many extensions and alterations have been

developed on the basis of CART; see Loh (2014) for a thorough review. CART is formulated in terms of

three main steps: (1) tree growing, (2) tree pruning, and (3) tree selection. In what follows, we present a

brief description on each of the three steps.

(1) Tree Growing. This step is to grow a large tree. Consider a data set with a response W and

covariates Xi = (Xi1, . . . , Xip)T . The partition algorithm begins with the top node which contains

the entire data set, and proceeds through binary splits of the form {Xij ≤ c} versus {Xij > c} (where

c is a constant) for ordinal or continuous covariate Xij , and Xij ∈ S (where S is a subset of the values

that Xij may take) for categorical covariate Xij , leading to two mutually exclusive descendant nodes.

The principle behind the splitting procedure is heterogeneity reduction in the response distribution

and this is realized by separating the data points into two exclusive descendant nodes, where the

responses are more homogeneous within each subset than they were combined together in the parent

node. The heterogeneity of each node can be measured in various ways depending on the nature of

12

Page 13: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

the underlying problem and usually quantified by a loss function such as the quadratic or L2 loss

function (for a node g)

L2,(g) =

I∑i=1

I{i ∈ g}(Wi −W (g)

)2, (5.21)

where Wi is the ith response from the node g and W (g) is the sample average of responses in node

g. In the course of tree building, the first split is selected as the one that yields the largest reduction

in loss. The second split is chosen as the one leading to the largest loss reduction among all possible

splits in the two descendant nodes. The tree keeps splitting until some pre-specified criteria, such as

the minimum number of observations in each terminal node and depth of tree, are achieved, and the

splitting process produces a large tree ψ0, which almost always overfits the data, so a less complex

model is needed and it will be built through steps (2) and (3).

(2) Tree Pruning. This step cuts (prunes) the large tree ψ0 from the last step to create a sequence of

nested subtrees. The creation of the subtrees relies on a cost-complexity pruning algorithm. Let ψg

denote a subtree of ψ0 rooted at node g containing g and all of its descendant nodes. We further

denote the set of terminal nodes of a tree ψ by ψ. The cost-complexity of a subtree ψ is defined as

Lν(ψ) = L(ψ) + ν|ψ|,

where L(ψ) =∑g∈ψ L2,(g) is the loss of the subtree ψ, |ψ| denotes the size of the tree ψ, i.e., the

number of terminal nodes in ψ, and ν is called complexity parameter. Whether the descendants of

node g should be pruned is decided by comparing the following two quantities:

Lν(ψg) = L(ψg) + ν|ψg|, and Lν(g) = L(g) + ν,

where Lν(ψg) and Lν(g) are respectively the cost-complexities of the tree ψg rooted at node g

containing g and all its descendant nodes and the tree having only node g, i.e., the tree ψg with all

its descendant nodes collapsed into node g.

νg =L(g)− L(ψg)

|ψg| − 1,

is the critical point at which Lν(ψg) is equal to Lν(g). If ν goes larger than νg, Lν(ψg) > Lν(g),

which suggests that it is not worthwhile to split further on node g and all descendant nodes of node

g should be collapsed into node g (i.e., pruned). The pruning process starts with the weakest-linked

node, i.e., calculating νg for all non-terminal nodes in the large tree ψ0 and pruning the descendants

of the one with the smallest νg. Working with the resulted subtree, the ν′gs of non-terminal nodes

in the subtree are recalculated and the weakest-linked node is pruned off. The pruning continues

13

Page 14: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

until only the top node left and it creates a sequence subtrees of ψ0 corresponding to a sequence of

critical points of the weakest-linked nodes of the subtrees. The sequence of critical points is denoted

by ν1, ν2, . . . , νM and each point corresponds to a subtree, where M is the number of subtrees in the

sequence.

(3) Tree Selection. This step selects the best model among the sequence of nested subtrees established

in the last step. Typically, the model selection is conducted via a cross-validation procedure (e.g.,

Breiman et al., 1984, Ch. 8.5), which is probably the most widely used method designed to estimate

the prediction error of each candidate tree. The prediction error is often, not necessarily though,

computed using the same loss functions as the one used for tree growing and tree pruning. To

conduct a L-fold cross-validation, the data set is divided into L exclusive subsets I1, I2, . . . , IL. The

cross-validation error for the subtree corresponding to a critical point νm is given by

L∑`=1

I∑i=1

I{i ∈ I`}(Wi − ψ`(Xi; νm))2,

where ψ`(Xi; νm) is the predicted value of the ith response under the tree grown using the full data

set excluding subset I` and pruned with complexity parameter fixed at νm. The best tree is selected

as the one on which the smallest cross-validation error is attained.

As described above, the loss functions play a fundamental role in building the large tree, creating the

sequence of candidate trees and selecting the best tree using cross-validation procedure. CART algorithm

can be implemented by the rpart package (Therneau and Atkinson, 2011) in the statistical software R (R

Core Team, 2014).

5.2 Credibility Regression Trees

As previously mentiond in the end of Section 4, in this paper we propose novel credibility regression trees

by altering the loss function and cross-validation procedure in the default regression tree algorithms. The

loss function and cross-validation procedure we proposed are as follows.

(1) Loss Functions. The loss function is the key component of the CART algorithm and the choice of

loss function decides the decision rules in tree growing, pruning and selection steps. We would expect

different decisions are made and different tree structure is eventually built by adopting a different

loss function. The credibility regression tree algorithm will use in tree growing and pruning steps one

of the credibility loss functions given in (4.15), (4.16), (4.19) and (4.20) in place of the default L2

loss function (5.21).

14

Page 15: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

(2) Longitudinal Cross-validation. The cross-validation procedure in CART divides the collective

I = {1, 2, . . . , I} into L subsets and the cross-validation error is calculated to estimate the prediction

error when interest lies in predicting individual premium for new policy holders with no claims

experience. However, in the context of credibility theory, the interest lies in predicting individual net

premium for those insureds who have already had claims experiences with the insurance company,

and thus, a new cross-validation procedure is required. The cross-validation procedure we propose

divides individual claims experiences into L subsets instead, and accordingly we call it longitudinal

cross-validation.

Recall that we observe (Yi,Xi) for individual risk i from the collective, where Yi = (Yi,1, . . . , Yi,ni)T

denotes the observed claims experience of individual risk i and Xi is the associated covariates,

i = 1, 2, . . . , I. The longitudinal cross-validation procedure we adopt for credibility regression tree

algorithm takes the following steps:

Step 1. For each i = 1, . . . , I, denote bi = bni/Lc and `i = (ni mod L), and define a sequence

Ti =

B, . . . ,B︸ ︷︷ ︸bi blocks

, 1, 2, . . . , `i

,

where B = {1, 2, . . . , L}. For example, when conducting 5-fold cross-validation, Ti = {1, 2, 3} for

ni = 3, and Ti = {1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3} for ni = 13. For each i = 1, . . . , I, independently

sample a number Vi,j from the sequence Ti without replacement for j = 1, . . . , ni.

Step 2. Denote

Ji,` := {j ∈ {1, . . . , ni} : Vi,j = `} and `Yi = {Yi,j : j ∈ Ji,`}, ` = 1, . . . , 5.

We can then divide the whole dataset of the collective into 5 subsets:{(`Yi,Xi) : `Yi 6= ∅, i ∈ I

}, ` = 1, . . . , 5.

Step 3. As in the CART algorithm, the pruning step in credibility regression tree also results

in a sequence of nested candidate trees associated with a sequence of complexity parameters

ν1, . . . , νM . The cross-validation error for the candidate tree corresponding to νm is computed

as

L∑`=1

I∑i=1

ni∑j=1

I{j ∈ Ji,`}(Yi,j − ψ`(Xi; νm))2 ,

where ψ`(Xi; νm) is the predicted value of the ith response under the tree grown using data

excluding the set {`Yi, i ∈ I} and pruned at fixed complexity parameter νm.

Step 4. Select the tree which leads to the smallest cross-validation error.

15

Page 16: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

5.3 Premium Calculation

Following the procedure described in the preceding Sections 5.2, we can use one of the four loss functions

defined in (4.15), (4.16), (4.19) and (4.20) to build a regression tree. The terminal nodes of an established

credibility regression tree form a partition of sample space into a number, say K, of sub-collectives,

Ik = {i ∈ I : Xi ∈ Ak}, k = 1, . . . ,K

where {Ak, k = 1, . . . ,K} is a partition of covariate space. Within each sub-collective k ∈ {1, . . . ,K},

individual premium can be computed by formula (4.18) as described in Section 4.

6 Simulation Studies: Balanced Claims Model

6.1 Simulation Schemes

We consider a collective of I = 300 individual risks from Model 3, Balanced Claims Model, and for

each i = 1, . . . , I, independently simulate covariate vector Xi = (Xi,1, . . . , Xi,5), where Xi,1, . . . , Xi,5 are

independent and identically distributed from the discrete uniform distribution over the set {1, . . . , 100}.

Our simulation studies are based on the following four distinct schemes, where the first scheme is the base

and the others are its modifications made from different perspectives.

Scheme 1 (Base simulation scheme)

For each i = 1, . . . , I, independently simulate εi,j from a given distribution function F (·), for j =

1, . . . , n, and define the n claims of individual risk i as

Yi,j = ef(Xi) + εi,j , j = 1, . . . , n, (6.22)

where

f(Xi) = 0.01(Xi,1 + 2Xi,2 −Xi,3 + 2

√Xi,1Xi,3 −

√Xi,2Xi,4

). (6.23)

In our simulation, F takes one of the following three distributions:

(1) EXP(1.6487): An exponential distribution with mean 1.6487, i.e., F (x) = 1− e−x/1.6487, x ≥ 0;

(2) LN (0, 1): A log-normal distribution with parameters 0 and 1, i.e., ln εi,j ∼ N(0, 1);

(3) PAR(3, 3.2974): A Pareto distribution F (x) = 1−(

3.29x+3.29

)3.

The three distributions above are selected to reflect different level of heavy-tailedness, and their param-

eter values are set to have the same expected values. The terms 2√Xi,1Xi,3 and

√Xi,2Xi,4 are included

16

Page 17: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

in the expression of f(Xi) to reflect the nonlinear and interaction effects from the covariate variables on

the claim distribution. The response variable Yi,j only depends on the first four covariates and the fifth

variable is included as a noise variable to mimic the scenario that some covariate variables included in the

analysis may not be influential for the claims Yi,j . In Scheme 1, the mean and the variance of individual

risk i are respectively given by

µ(Xi,Θi) = ef(Xi) + E[ε] ≈ ef(Xi) + 1.65 (6.24)

and

σ2(Xi,Θi) = Var[ε] ≈

1.6487, for EXP(1.6487),

4.6708, for LN (0, 1),

8.1181, for PAR(3, 3.2974),

(6.25)

for i = 1, . . . , I, where E[ε] and Var[ε], respectively, denote the mean and the variance of the distribution

function F .

As shown in (6.25), the variance of the claim random variable for each individual risk simulated from

Scheme 1 remains the same across the collective and does not depend on covariates. To reflect the scenario

where the variance of claim random variable may differ across the collective of individual risks, we design

the following scheme, which differs from Scheme 1 in way of generating variables εi,j .

Scheme 2

For each i = 1, . . . , I, independently simulate εi,j from a distribution function F (·;Xi) which depends

on the covariate Xi of the individual risk i, for j = 1, . . . , n, and define the n claims of individual

risk i as

Yi,j = ef(Xi) + εi,j , j = 1, . . . , n, (6.26)

where f(Xi) is given by (6.23). We respectively consider three distinct distributions for F (·;Xi):

(1) EXP(eγ(Xi)/2

), (2) LN(0, γ(Xi)), (3) PAR(3, 2eγ(Xi)/2), (6.27)

where γ(Xi) = 1102

∣∣2Xi,1 −Xi,2 +√Xi,1Xi,2

∣∣.The function γ(Xi) in Scheme 2 is normalized by 102 to attain E[γ(Xi)] = 1, which is the value of the

scale parameter in the log-normal distribution used in Scheme 1. The mean and the variance of individual

risk i are respectively given by

µ(Xi,Θi) = ef(Xi) + E[εi,1]

17

Page 18: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

=

ef(Xi) + eγ(Xi)/2, for EXP(eγ(Xi)/2),

ef(Xi) + eγ(Xi)/2, for LN(0, γ(Xi)),

ef(Xi) + 2eγ(Xi)/2, for PAR(3, 2eγ(Xi)/2),

(6.28)

and

σ2(Xi,Θi) = Var[εi,1] =

eγ(Xi)/2, for EXP(eγ(Xi)/2),(eγ(Xi) − 1

)eγ(Xi), for LN(0, γ(Xi)),

3eγ(Xi), for PAR(3, 2eγ(Xi))/2

),

(6.29)

for i = 1, . . . , I.

Note that the claim distribution of each individual risk i is fully determined by its covariate Xi simula-

tion schemes 1 and 2. This means that the risk profile is completely characterized by observable covariates

Xi and there is no random effect on the claims distribution. However, two insureds, in insurance practice,

with exactly the same covariate values may have different risk profiles thus it is reasonable to take into

account a random effect component when modelling the claims distribution. In light of such observation,

we design simulation schemes 3 and 4.

Scheme 3

For each i = 1, . . . , I,

(1) independently simulate random effect variable Θi from the uniform distribution U(0.9, 1.1).

(2) independently simulate εi,j from a distribution function F (·;Xi) which depends on the covariates

Xi associated with risk i, for j = 1, . . . , n, and

(3) define the n claims of individual risk i as

Yi,j = Θi

[ef(Xi) + εi,j

], j = 1, . . . , n, (6.30)

where f(Xi) is given by (6.23). In our simulation studies, we consider each of the three distri-

butions described in Scheme 2 for the distribution F (·;Xi) in the step (2) of the present scheme.

In Scheme 3, a random variable Θi is multiplied to[ef(Xi) + εi,j

], j = 1, . . . , n, for generating claims

Yi,j and reflecting the random effect from covariates on claims distribution. The claim distribution of two

risks with the same covariates is no longer the same in simulation Scheme 3, because the claim distribution

depends on the random effect variable ξi in addition to the covariate Xi. This also means that the profile

of an individual risk i simulated for Scheme 3 can not be fully captured by its covariates Xi anymore.

Accordingly, for each individual risk i = 1, 2, . . . , I, the mean and the variance of risk are respectively given

by

µ(Xi,Θi) = Θi

(ef(Xi) + E[εi,1]

)18

Page 19: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

=

Θi

(ef(Xi) + eγ(Xi)/2

), for EXP(eγ(Xi)/2),

Θi

(ef(Xi) + eγ(Xi)/2

), for LN(0, γ(Xi)),

Θi

(ef(Xi) + 2eγ(Xi)/2

), for PAR(3, 2eγ(Xi)/2),

(6.31)

and

σ2(Xi,Θi) = ξ2i Var[εi,1] =

Θ2i eγ(Xi)/2, for EXP(eγ(Xi)/2),

Θ2i

(eγ(Xi) − 1

)eγ(Xi), for LN(0, γ(Xi)),

3Θ2i eγ(Xi), for PAR

(3, 2eγ(Xi)/2

).

(6.32)

Note that the variance of individual claim distribution depends on the random effect variable Θi in addition

to the covariate Xi in Scheme 3.

Scheme 3 contains a multiplicative random effect. We next consider a different way of incorporating

random effect term into the claim distribution as described in Scheme 4.

Scheme 4

For each i = 1, . . . , I, Θi = (ξi,1, ξi,2)T ,

(1) independently simulate random effect variables ξi,1 and ξi,2 from the uniform distribution U(0.9, 1.1).

(2) independently simulate εi,j from a distribution function F (·;Xi, ξi,2) which depends on the co-

variate Xi and random effect variable ξi,2, for j = 1, . . . , n, and

(3) define the n claims of individual risk i as

Yi,j = eξi,1·f(Xi) + εi,j , j = 1, . . . , n, (6.33)

where f(Xi) is defined in (6.23). We respectively consider the three distributions as defined in

equation (6.27) in Scheme 2 with γ(Xi) replaced by ξi,2 · γ(Xi) so that the distribution of εi,j

depends on the random effect variable ξi,2 in addition to covariate variable Xi.

For individual risks i = 1, . . . , I from Scheme 4, the mean and the variance of individual risk are

respectively given by

µ(Xi,Θi) = eξi,1·f(Xi) + E[εi,1]

=

eξi,1·f(Xi) + e[ξi,2·γ(Xi)]/2, for EXP(e[ξi,2·γ(Xi)]/2),

eξi,1·f(Xi) + e[ξi,2·γ(Xi)]/2, for LN(0, ξi,2 · γ(Xi)),

eξi,1·f(Xi) + 2e[ξi,2·γ(Xi)]/2, for PAR(3, 2e[γ(Xi)]/2),

(6.34)

and

σ2(Xi,Θi) = Var[εi,1] =

e[ξi,2·γ(Xi)]/2, for EXP(e[ξi,2·γ(Xi)]/2),(eξi,2·γ(Xi) − 1

)eξi,2·γ(Xi), for LN(0, ξi,2 · γ(Xi)),

3eξi,2·γ(Xi), for PAR(3, 2e[ξi,2·γ(Xi)]/2

).

(6.35)

19

Page 20: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

For each of the four simulation schemes designed above and each of the three involved distributions

(exponential, log-normal, and Pareto), we independently simulate 1, 000 samples of the claims experience

of I = 300 individual risks with n = 5, 10 and 20, respectively. For each simulated sample, we use L2 loss

function (5.21) and the four credibility loss functions defined in (4.15), (4.16), (4.19) and (4.20) to grow

and prune regression trees and the best tree is selected using longitudinal cross-validation. In addition, we

consider ad hoc covariate-dependent partitioning, which is defined via the following notation:

P(Xk) ={{i ∈ I : Xi,j ≤ 50}, {i ∈ I : Xi,j > 50}

}, j = 1, · · · , 5.

and

P(Xj1 , Xj2) ={{i ∈ I : Xi,j1 ≤ 50, Xi,j2 ≤ 50}, {i ∈ I : Xi,j1 ≤ 50, Xi,j2 > 50},

{i ∈ I : Xi,j1 > 50, Xi,j2 ≤ 50}, {i ∈ I : Xi,j1 > 50, Xi,j2 > 50}},

for j1, j2 = 1, . . . , 5 with j1 6= j2. P(Xk) presents a equally binary partitioning based on a single covariate

and P(Xj1 , Xj2) is one based on two covariates. Those based on three or four covariates are denoted by

P(Xk1 , Xk2 , Xk3) and P(Xk1 , Xk2 , Xk3 , Xk4) and can be defined in the same manner. In our simulations,

P(X2), P(X4), P(X1, X2, X3), P(X1, X2, X4), P(X2, X3, X4), P(X1, X3, X4), and P(X1, X2, X3, X4) are

considered.

6.2 Performance Measure: Prediction Error

To compare the prediction performance of credibility premium resulted from various partitioning (ad hoc

or data-driven) methods, we compute the prediction error for a given partitioning {I1, . . . , IK} as follows

PE =1

I

I∑i=1

K∑k=1

I{Xi ∈ Ik}(π(H)(k)i − µ(Xi,Θi)

)2, (6.36)

where the form of π(H)(k)i is given in (4.18) and the net premium of each individual risk µ(Xi,Θi) is given in

(6.24), (6.28), (6.31) and (6.34) for Schemes 1-4 respectively. In addition, we consider credibility premium

P(H)i in (2.4) without partitioning the collective. The resulting collective prediction error is computed as

PE0 =1

I

I∑i=1

(P

(H)i − µ(Xi,Θi)

)2,

which does not use any covariate information and is anticipated to underperform compared to various

kinds of covariate-dependent partitioning. We define the relative prediction error (RPE) for each covariate-

dependent partitioning as the ratio of prediction error to the collective prediction error and denote it by

R = PE/PE0. Smaller prediction error or relative prediction error suggests better performance.

20

Page 21: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

6.3 Simulation Results

The performance in terms of RPE of the various covariate-dependent partitioning methods under Balanced

Claims Model with n = 5 is summarized in Figure 2, which contains 12 plots and summarizes the results for

three distributions (exponential, log-normal, Pareto) and four simulation schemes described in Section 6.1.

The 12 boxplots in each plot respectively correspond to four credibility regression tree premiums (high-

lighted in grey color) using credibility loss function (4.15), (4.16), (4.19) and (4.20) (R1-R4), L2 regression

tree premium (RL2), partition P(X2) (R(2)), partition P(X4) (R(4)), partition P(X1, X2, X3) (R(123)),

partition P(X1, X2, X4) (R(124)), partition P(X2, X3, X4) (R(234)), partition P(X1, X3, X4) (R(134)) and

partition P(X1, X2, X3, X4) (R(1234)).

We have the following interesting observations from Figure 2.

1. Almost all the boxes in Figure 2 are below the value of 1 with an exception being the one resulted from

the ad hoc partitioning using covariate Xi,4 and with boxplots labeled by R(4). A RPE smaller than

1 suggests more accurate prediction compared to the credibility premium for which no partitioning

is performed. Thus, Figure 2 indicates that all the covariate-dependent partitioning methods we

considered except P(X4) enhance the prediction accuracy of credibility premium. If revisiting the

form of f(Xi) in (6.23), one can see that Xi,4 only appears in the interaction term, and it may not

be informative enough to be used solely for enhancing prediction accuracy.

2. In each of the 12 plots, the first four boxplots in grey colour summarize the PRE’s resulted from

the 1,000 simulations using the four proposed credibility regression trees. The four boxplots are

positioned at a similar level and take a similar shape across all the 12 plots, which indicates that

the four credibility losses perform almost equally when being used to build regression tree credibility

premium to predict individual net premium.

3. The four boxplots in grey are constantly positioned lower than those from the other prediction rules,

and this is true across all the plots in Figure 2. This observation indicates that our regression

tree credibility premium always outperform the others across all the simulation schemes and all the

distribution types involved in our numerical simulation.

4. The L2 regression tree performs consistently worse than the four credibility regression trees, but it

performs either better than or as well as each of the 7 ad hoc partitioning methods.

5. Comparing different columns in Figure 2, one can see that the relative prediction errors tend to be

smaller for Pareto distribution, whereas the exponential distribution tends to yield larger relative pre-

diction errors. As we know, Pareto distribution is commonly perceived as a heavy-tailed distribution,

21

Page 22: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

Figure 2: Boxplots of RPE’s for Balanced Claims Model with n = 5 for three distributions and four

simulation schemes (Schemes 1-4) using various covariate-dependent partitioning methods.

Exponential

●●

●●●

●●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●●●

●●●

●●●

●●●●

●●●●

●●

●●●●●

●●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.6

0.7

0.8

0.9

1.0

Log-normal

●●●

●●

●●●

●●●

●●●●

●●●

● ●

●●●

●●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●●●●●●●●●●

●●●

●●●●●●

●●●●●●●●●

●●

●●●

●●

●●

●●●

● ●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.4

0.6

0.8

1.0

Pareto

●●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●

●●●●

●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

Sch

eme1

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.7

0.8

0.9

1.0

1.1

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●

●●●

●●

●●●●●

●●

●●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

●●

●●

●●

●●● ●●

●●

●●

●●

●● ●

●●

●●●

●●

●●

● ●●

●●

●●●

●●

●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

●●●

●●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

Sch

eme2

●●

● ●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.7

0.8

0.9

1.0

1.1

●●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●●●●

●●

●●●●

●●

●●

●●●

●●

●●●●●●

●●●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

Sch

eme3

●●●

●●

●●

●●

●●●

●●●

●●●●●●●●

●●●●●

●●●●●

●●●●●●

●●

●●

●●

●●

●●●●

●●●

●●●●

●●●

●●●

●●

●●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.7

0.8

0.9

1.0

1.1

●●●

● ●●●

●●●

● ●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●●●●●

●●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

Sch

eme4

22

Page 23: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

exponential distribution is light-tailed, and log-normal is in between. Therefore, such a comparison

suggests that our regression tree based methods can improve relative prediction accuracy to a larger

extent for a heavy-tailed claim distribution than for a light-tailed one.

6. The plots in the third and fourth rows contain results respectively from simulation schemes 3 and 4,

where the individual net premium depends on unobservable random effect term(s). Compared with

those in the first and second rows, they do not display an obvious shift in their positions and shapes,

which hints that the presence of random effects does not deteriorate the prediction performance of

regression tree credibility premium.

7. The rightmost 7 boxplots in each plot are respectively using ad hoc partitions P(X2), P(X4),

P(X1, X2, X3), P(X1, X2, X4), P(X2, X3, X4), P(X1, X3, X4) and P(X1, X2, X3, X4). As one can

see, their partitions are all based on the four influential covariate variables Xi,1, Xi,2, Xi,3 and Xi,4

and the noise covariate Xi,5 is not involved. In contrast, we did not preclude the covariate Xi,5

from the list of covariates by any ex ante variable selection procedure in the course of building the

credibility regression trees and the L2 regression tree. The boxplots in Figure 2, however, show that

the regression tree based methods yield significantly smaller RPE’s than those ad hoc partitioning

methods even though they are disadvantaged by the inclusion of the noise covariate Xi,5.

8. The relative prediction errors of partitions P(X1, X2, X3) and P(X1, X2, X4) are in general smaller

than the ones using partitions P(X2, X3, X4) and P(X1, X3, X4) even though the number of partitions

are the same for the four partitioning methods. This suggests that it is not the number of sub-

collectives into which the collective is partioned that matters, but the form of partition that matters.

A partition formed in an ad hoc manner does not guarantee a promising prediction result and thus,

it is necessary to adopt data-driven partitioning algorithms, such as our regression tree based model,

for developing accurate premium prediction.

We also examine the performance of the prediction rules by increasing the number of individual claims

to n = 10 and n = 20, and the resulting RPE’s are respectively reported by boxplots shown in Figures 4

and 5 in Appendix B, for which all the comments made for n = 5 also apply.

We further explore how the prediction accuracy of those 13 prediction rules may change along with the

increase of the claims experience of individual risks, which is captured by the parameter n in our simulation

schemes. To this end, we compute the average (absolute) prediction error from those 1,000 simulations for

each combination of a value for n (5, 10, or 20), a distribution (exponential, log-normal, or Pareto) and

one simulation scheme from those four which we have previously defined. Table 1 reports the results for

base prediction rule (PE0), CRT premium using credibility loss function (4.19) (PE3), L2 regression tree

23

Page 24: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

Table 1: The average of prediction errors from 1000 simulations for Balanced Claims Model using various

covariate-dependent partitioning.

Scheme 1 Scheme 2 Scheme 3 Scheme 4

EXP LN PAR EXP LN PAR EXP LN PAR EXP LN PAR

n=

5 PE0 0.526 0.884 1.445 0.645 2.038 1.731 0.648 2.070 1.801 0.647 2.143 1.860PE3 0.406 0.576 0.797 0.540 1.247 1.064 0.549 1.279 1.105 0.545 1.296 1.150

PEL2 0.420 0.640 0.914 0.554 1.354 1.189 0.560 1.386 1.243 0.560 1.432 1.285PE(1234) 0.449 0.672 0.925 0.580 1.314 1.200 0.584 1.313 1.230 0.585 1.347 1.283

EXP LN PAR EXP LN PAR EXP LN PAR EXP LN PAR

n=

10

PE0 0.268 0.462 0.788 0.328 1.180 0.941 0.328 1.153 0.969 0.331 1.130 0.963PE3 0.224 0.339 0.486 0.295 0.782 0.651 0.297 0.804 0.671 0.299 0.776 0.666

PEL2 0.231 0.369 0.550 0.300 0.852 0.718 0.301 0.858 0.742 0.304 0.869 0.738PE(1234) 0.243 0.387 0.568 0.310 0.822 0.732 0.311 0.828 0.749 0.313 0.830 0.744

EXP LN PAR EXP LN PAR EXP LN PAR EXP LN PAR

n=

20

PE0 0.135 0.234 0.415 0.166 0.620 0.503 0.167 0.600 0.511 0.166 0.631 0.506PE3 0.120 0.190 0.288 0.156 0.469 0.389 0.158 0.466 0.391 0.157 0.479 0.393

PEL2 0.123 0.203 0.318 0.157 0.498 0.415 0.159 0.492 0.418 0.158 0.519 0.423PE(1234) 0.128 0.211 0.327 0.160 0.486 0.419 0.162 0.484 0.419 0.161 0.498 0.425

premium using loss function (5.21) (PEL2) and an ad hoc partitioning method using all the four influential

covariates (PE(1234)). We only choose PE3 as a representative of the four credibility loss based prediction

rules in Table 1, because these four rules all perform similarly as one can tell from the plots in Figures

2, 4 and 5. Further, all the 7 ad hoc partitions, we only select PE(1234) to report in Table 1, because it

performs the best among the group as indicated by those boxplots in Figures 2, 4 and 5.

Table 1 demonstrates that the average prediction error significantly reduces along with the increase of

claims experience, and such a trend holds across all combinations from the four prediction rules, the four

simulation schemes, and the three considered distribution types. Let us take the column for Scheme 1 and

exponential distribution as example, where one can see that the prediction errors are within [0.406, 0.526]

for n = 5, the values shift to a range of [0.224, 0.268] for n = 10 and an even smaller range of [0.120, 0.135]

for n = 20. Such a result is consistent to our intuition that the prediction should become more accurate

with more information brought into the prediction. Further, we compute, for each cell in Table 1, the

standard error as the empirical standard deviation of the corresponding 1,000 simulated PE’s divided by

square root of 1,000. The largest standard error among all the cells in the panel with n = 5 is 0.048; it

is 0.030 for n = 10, and 0.022 for n = 20. These small standard error values confirm that the decrease in

prediction error is indeed statistically significant.

24

Page 25: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

7 Simulation Studies: Unbalanced Claims Model

The simulation studies in Section 6 are conducted to investigate the prediction performance of various

premium prediction rules under Balanced Claims Model; in this section, we use simulation studies to

demonstrate their prediction performance for the general Unbalanced Claims Model, i.e., Model 2, where

the distribution of claims experience Yi,j of individual risk i depends on its covariate Xi, risk profile Θi

and an additional quantity mi,j , called risk exposure or weight variable. mi,j can be interpreted as the

number of years at risk of individual risk i in year j.

In insurance practice, the individual risks in a collective may have different length of claims experiences,

i.e., the value of ni may vary from one risk to another. To reflect this feature, we simulate the number

ni for each risk in the collective from the following distribution D(n0), where the parameter n0 measures

the distribution mean. We simulate the risk exposure mi,j independent and identically distributed from a

Table 2: Distribution D(n0) for the number of years of claims experience.

k n0 − 2 n0 − 1, n0, n0 + 1, n0 + 2

Pr(N = k) 1/16 1/8 5/8 1/8 1/16

2-point distribution M valued either 0.5 or 1 with probabilities 0.2 and 0.8 respectively. The risk exposure

variables can be interpreted in the way that individual risk i has half-year of claims experience in year j

for mi,j = 0.5 and it has one year of claims experience in year j for mi,j = 1. Assume that there are five

covariate variables Xi = (Xi,1, . . . , Xi,5)T from the discrete uniform distribution over the set {1, . . . , 100}.

We consider the following four simulation schemes (labeled by Scheme 5-8 respectively) paralleled with

Scheme 1-4 described in Section 6 respectively.

For each i = 1, . . . , I and j = 1, . . . , ni, simulate two independent random variables Yi,j,1 and Yi,j,2,

which are identically distributed as a random variable specified for each scheme as below:

Scheme 5: the claim random variable in (6.22) from Scheme 1;

Scheme 6: the claim random variable in (6.26) from Scheme 2;

Scheme 7: the claim random variable in (6.30) from Scheme 3;

Scheme 8: the claim random variable in (6.33) from Scheme 4.

The claims under Unbalanced Claims Model are given by

Yi,j =

2Yi,j,1, if the simulated mi,j = 0.5,

Yi,j,1 + Yi,j,2, if the simulated mi,j = 1.

25

Page 26: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

Figure 3: Boxplots of RPE’s for Unbalanced Claims Model for three distributions and four simulation

schemes (Schemes 5-8) using various covariate-dependent partitioning methods.

Exponential

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.7

0.8

0.9

1.0

Log-normal

●●●●●●

●●

●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●

●●

●●●●

●●●●

●●

●●●●

●●●●●●●●●●●●●●●●●●●● ●●●●●●●

●●

●●●●●●●●●●●●●●●●

●●●●●●●●

●●●

●●●●●

●●●●●

●●

●●●●

●●●●●●●●●●●●●●●● ●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●

●●

●●

●●●●

●●

●●●●

●●●●

●●●●

●●●●●●●●●

●●●●●●●●● ●●●●●●

●●

●●●●●●●●●●

●●●●●●●●●●●

●●

●●●

●●●●●

●●

●●

●●

●●●

●●●●●

●●●●●●●●●

●●

●●

●●

●●●●

●●●●

●●●●●●

●●●

●●●

●●●

●●●●●●●●

●●●●●●●

●●●●●●●●

●●

●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.4

0.6

0.8

1.0

1.2

Pareto

●●●●●●●●

●●●●

●●●●●

●●●

●●●

●●●

●●●●

●●

● ●●●●●●●●●

●●●●●

●●

●●●●●

●●

●●

●●●

●●

● ●●●●●

●●●●

●●●●

●●●●●

●●

●●●●●

●●●

●●●

●●

● ●●●●

●●●●

●●●●●

●●

●●●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

Sch

eme5

●●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●

●●●●●

●●●●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●● ●

●●●●●●●●

●●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.7

0.8

0.9

1.0

1.1●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●●●●

●●

●●

●●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

Sch

eme6

●●●

●●●●

●●

●●●

●●●●●●●●

●●

●●●●●●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.8

0.9

1.0

1.1

●●

●●

● ●

●●

●●

● ●

●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●● ●

●●

●●●

●●

●●●●●●

●●●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●●

●●●●

●●●●

●●●●●

●●●●●

●●●

●●●

●●●●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0 Sch

eme7

●●

●●●●●●●

●●●●

●●●●●●

●●

●●●●●●●●●●●●●

●●

●●●●●●

●●●

●●●

●●

●●●

●●●●

●●●

●●●

●●

●●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.8

0.9

1.0

1.1

●●●●

●●

●●

● ●●●●

●●

● ●●●●

●●

●●

● ●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●●

●●●

●●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●●●

●●●●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

● ●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●●

●●●

●●

●●

●●

●●

●●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

1.4

●●

●●

●●

●●

●●●

●●● ●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●● ●

●●

●●

●●

●●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●●●●

●●●●

●●●●●●

●●●

●●●

●●

●●●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●●

●●●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Sch

eme8

26

Page 27: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

For each simulation scheme and each distribution type (exponential, log-normal, and Pareto), we in-

dependently simulate 1000 samples and calculate relative prediction errors following the procedure as

described in Section 6.2 for each of the same 13 prediction rules as we have studied in the preceding section

for Balanced Claims Model. We choose n0 = 5 in the simulation study. The performance of prediction

rules under Unbalanced Claims Model is summarized by the boxplots in Figure 3. The rows of Figure

3 correspond to simulation schemes 5-8 and the columns correspond to three distributions (exponential,

log-normal and Pareto). The same labels as what used in the preceding section for Balanced Claims Model

are adopted for those boxplots in Figure 3. Figure 3 exhibits the same pattern as Figure 2 and all the

comments we have made upon Figure 2 for the Balanced Claims Model apply to the results that we ob-

tained for the Unbalanced Claims Model, except that the relative prediction errors tend to be larger for the

model with unbalanced claims. One of these comments is worth emphasizing again, i.e., the four credibility

regression tree based models are still the best-performed among all 13 prediction rules under consideration.

8 Concluding Remarks

Credibility theory has been widely applied by actuaries for insurance experience rating, and relevant co-

variate variables have been incorporated into various credibility models to improve prediction performance

of credibility premium. The existing literature typically link covariate variables to credibility premiums

via certain pre-specified parametric regression equation and thus lack flexibility in capturing nonlinear and

interaction effects of covariate variables on individual net premiums. The innovative regression tree credi-

bility (RTC) model proposed in the present paper brings machine learning techniques into the context of

credibility theory to enhance the prediction accuracy of credibility premium. It provides a non-parametric

alternative to those parametric regression based models. In our proposed RTC model, no ex ante analysis

on the relationship between individual net premium and covariate variables is necessary, and the designed

regression tree algorithm automatically selects influential covariate variables and informative cutting points

to form a partition of data space, upon which a well-performed premium prediction rule can be consequently

established.

Although only the Classification and Regression Trees is introduced in this paper to enhance the predic-

tion performance of credibility premium, it will be fruitful to pursue further research by considering other

recursive partitioning methods, e.g., partDSA (partitioning deletion/substitution/addition algorithm) and

MARS (multivariate adaptive regression splines), and it will be even more promising to consider the ap-

plications of ensemble algorithms, such as bagging, boosting, and random forests. These machine learning

algorithms constitute a large set of quantitative tools for the development of effective predictive models.

It will be even more fruitful to consider their applications in various other insurance problems in addition

27

Page 28: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

to the premium rating, since many practical insurance problems amount to quantifying the relationship

between insureds’ claims and their demographic information. It is the authors’ hope that the present paper

will stimulate more actuarial applications of these machine learning techniques and eventually contribute

to the development of insurance predictive analytics in general.

Appendix A

The proof of Proposition 3.1 relies on the concavity of the following function

h(x, y) =1

1x + 1

y

: (0,∞)× (0,∞) 7→ (0,∞). (8.37)

Lemma 8.1 h(x, y) defined in (8.37) is a concave function.

Proof. Indeed, for (x, y) ∈ (0,∞)× (0,∞), function h(x, y) is differentiable with derivatives as follows

h′′xx(x, y) =(1 + xy−1

)−2= −2y−1(1 + xy−1)−3 = −2y2(y + x)−3

h′′xy(x, y) = 2xy−2(1 + xy−1

)−3= 2xy(y + x)−3

h′′yy(x, y) = −2x2(x+ y)−3.

Let H be the Hessian matrix of function h. Then, for any vector a = (a1, a2)′ ∈ R2,

a′Ha = (a1, a2)

−2y2(y + x)−3, 2xy(y + x)−3

2xy(y + x)−3, −2x2(x+ y)−3

a1

a2

= −2(x+ y)−3 (a1y − a2x)

2 ≤ 0,

for any (x, y) ∈ (0,∞)2. Thus, function h(·, ·) is concave on (0,∞)2.

Proof of Proposition 3.1. We use the function h to write the loss L defined in (3.9) as L = I ·h(σ2/n, τ2),

and similarly L1 = pI · h(σ2(1)/n, τ

2(1)) and L2 = qI · h(σ2

(2)/n, τ2(2)). Since h(x, y) is a concave function as

shown in Lemma 8.1, we use those facts in equation (3.9) to obtain

L1 + L2 = pI · h(σ2(1)/n, τ

2(1)) + qI · h(σ2

(2)/n, τ2(2))

≤ I · h(pσ2

(1)/n+ qσ2(2)/n, pτ

2(1) + qτ2(2)

)= I · h(σ2/n, τ2)

=I

n/σ2 + 1/τ2

≤ I

n/σ2 + 1/τ2

= L,

28

Page 29: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

where the first inequality is due to the convexity of function h, and the second inequality is due to the

implication τ2 ≤ τ2 from (3.9). �

Appendix B.

Boxplots of relatively prediction errors under Balanced Claims Model with n = 10 and n = 20 are presented.

All the comments made for n = 5 in Section 6.3 apply to the two cases of n = 10 and n = 20.

29

Page 30: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

Figure 4: Boxplots of RPE’s for Balanced Claims Model with n = 10 for three distributions and four

simulation schemes (Schemes 1-4) using various covariate-dependent partitioning methods.

Exponential

●●●

●●●

●●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●●●●●●●●●●●

●●

●●●

●●●

●●

●●

●●●●

●●●

●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.7

0.8

0.9

1.0

Log-normal

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●

●●

●●●

●●●●●●●●

●●

●●●

●●●

●●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●●●●

●●

●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Pareto

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●●

●●

●●

●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

Sch

eme1

●●

●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●●●●●

●●●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.8

0.9

1.0

1.1

1.2

●●●●

●●●

●●

●●

●●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

●●●

●●

●●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●●●●

●●

●●●●

●●●

●●●●●

●●●●●

●●●●

●●●●●

●●

●●

●●●

●●●

●●●

●●

●●

●●●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

Sch

eme2

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●●●●●●

●●●●●●●

●●●

● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.8

0.9

1.0

1.1

●●●

●●●

●●

●●●

●●●

●●●

●●

●●●

●●●●●●●

●●

●●●

●●●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●●●●

●●

●●

●●

●●●●

●●●●●

●●●

●●●

●●●●●●●

●●●●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●●●●●

●●●

●●●

●●●

●●

●●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

Sch

eme3

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

● ●●

●●●●

●●

●●●●

●●●●●●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

● ●

●●

●●●●●●●●●●●●

●●

●●●

●●●●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.8

0.9

1.0

1.1 ●

●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●

●●

●●●●●

●●●●●

●●●

●●●●●●●●

●●

●●

●●

●●●

●●●●

●●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

●●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●●●●●●●

●●●●●●●

●●●

●●

●●●

●●●●

●●

●●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

Sch

eme4

30

Page 31: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

Figure 5: Boxplots of RPE’s for Balanced Claims Model with n = 20 for three distributions and four

simulation schemes (Schemes 1-4) using various covariate-dependent partitioning methods.

Exponential

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●●●

●●●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.75

0.80

0.85

0.90

0.95

1.00

1.05

Log-normal

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●●●

●●●●●●●●

●●

●●●●

●●

●●

●●●

●●

● ●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.5

0.6

0.7

0.8

0.9

1.0

Pareto

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●●●●●●●●●

●●●

●●●

●●

●●●

●●●●

●●●●●

●●

●●

●●●

●●●●

●●●

●●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●

●●

●●

●●

●●●●

●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

Sch

eme1

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.80

0.85

0.90

0.95

1.00

1.05

1.10●

●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●●●●●

●●

●●

●●●

●●●●●

●●

●●

●●

●●●●

●●●●●

●●

●●●●

●●●●

●●

●●

●●

●●●●●

●●●

●●●

●●

●●●●●●

●●●

●●●●

●●

●●

●●

●●

●●●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●●

●●●●●

●●

●●

●●●●●●●●●●●●

●●●●●●●

●●

●●

●●●●●●●●●●●

●●

●●

●●●●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●●

●●

●●●●

●●

●●

●●●

●●●●●

●●●●●

●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●●●●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●

●●●

●●●●

●●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

Sch

eme2

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●●

●●●●●●

●●

●●●●

●●

●●

●●●

●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.80

0.85

0.90

0.95

1.00

1.05

1.10

1.15

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●●●●

●●●●

●●●

●●

●●●●

●●●●●●●●

●●

●●●

●●●●●

●●●

●●●●●●●●

●●●

●●●

●●●●●●●

●●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

●●●

●●●●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●●●●

●●

●●

●●●●●

●●●●●●

●●●●●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

Sch

eme3

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●●●

●●●

●●●

●●

●●

●●

●● ●

●●

●●●●●●

●●

●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.85

0.90

0.95

1.00

1.05

1.10

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●●

●●●●

●●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●

●●

●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

1.2

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●

●●●●●●●●●

●●

●●

●●●●●●●

●●

● ●

●●●●

●●●●●●

●●●●●

●●●●

●●●●●●●●●●●●●

●●●●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

R1

R2

R3

R4

RL 2

R(2

)

R(4

)

R(1

23)

R(1

24)

R(2

34)

R(1

34)

R(1

234)

0.2

0.4

0.6

0.8

1.0

Sch

eme4

31

Page 32: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

References

[1] Buhlmann, H. (1967). Experience rating and credibility. ASTIN Bulletin 4, 199-207.

[2] Buhlmann, H. (1969). Experience rating and credibility. ASTIN Bulletin 5, 157-165.

[3] Buhlmann, H., Gisler, A. (2005). A Course in Credibility Theory and its Applications. Springer, New

York.

[4] Buhlmann, H., Jewell, W.S. (1987). Hierarchical credibility revisited. Bulletin of Swiss Association of

Actuaries, 35-54.

[5] Buhlmann, H., Straub, E. (1970). Glaubwurdigkeit fur schadensatze. Bulletin of Swiss Association of

Actuaries, 111-133.

[6] Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees.

Belmont, CA: Wadsworth International Group: The Wadsworth Statistics/Probability Series.

[7] Garrido, J., Zhou, J. (2009). Full credibility with generalized linear and mixed models. ASTIN Bulletin

39(1), 61-80.

[8] Hachemeister, C.A. (1975). Credibility for regression models with application to trend. In Kahn, P.

M. (Eds.), Credibility: Theory and Applications, Academic Press, New York.

[9] Hickman, J.C., Heacox, L. (1999). Credibility theory: the cornerstone of actuarial science. North

American Actuarial Journal 3(2), 1-8.

[10] Jewell, W.S. (1973). Multidimensional credibility. Report ORC Berkeley.

[11] Jewell, W.S. (1975). The use of collateral data in credibility: a hierarchical model. Research Memo-

randum 75-24 of the International Institute for Applied Systems Analysis. Laxenburg, Austria.

[12] Loh, W.-Y. (2014). Fifty years of classification and regression trees. International Statistical Review

82(3), 329–348.

[13] Nelder, J.A., Verrall, R.J. (1997). Credibility theory and generalized linear models. ASTIN Bulletin

27, 71-82.

[14] Norberg, R. (1986). Hierarchical credibility: Analysis of a random effect linear model with nested

classification. Scandinavian Actuarial Journal, 204-222.

[15] Ohlsson, E. (2010). Combining generalized linear models and credibility models in practice. Scandi-

navian Actuarial Journal 4, 301-314.

32

Page 33: Regression Tree Credibility Model · 2017-01-04 · 2 Buhlmann-Straub Credibility Model In this section, a brie y review of the Buhlmann-Straub credibility model will be presented

[16] R Core Team. (2014). R: A language and environment for statistical computing, R Foundation for

Statistical Computing. Vienna, Austria. ISBN 3-900051-07-0.

[17] Sundt, B. (1980). A multi-level hierarchical credibility regression model. Scandinavian Actuarial Jour-

nal, 25-32.

[18] Taylor, G.C. (1979). Credibility analysis of a general hierarchical model. Scandinavian Actuarial Jour-

nal, 1-12.

[19] Therneau, T.M., Atkinson, B. (2011). rpart: Recursive partitioning, R port by Brian Ripley. R package

version 4.1-8.

[20] Whitney, A.W. (1918). The theory of experience rating. Proceedings of the Casualty Actuarial Society

4, 274-292.

[21] Zhang, H., Singer, B.H. (2010). Recursive Partitioning and Applications. Second Edition. Springer,

New York.

33