inna kolyshkina pricewaterhousecoopers - salford...

35
i Using Multivariate Adaptive Regression Splines (MARS®) to enhance Generalised Linear Models. Inna Kolyshkina PriceWaterhouseCoopers

Upload: tranbao

Post on 29-Mar-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

i

Using Multivariate Adaptive Regression Splines (MARS®) to enhance Generalised Linear Models.

Inna Kolyshkina PriceWaterhouseCoopers

Page 2: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 2PricewaterhouseCoopers

Why enhance GLM? Shortcomings of the linear modelling approach.

GLM being a linear technique shares the usual shortcomings of the linear modelling approach:– operate under the assumption that data is distributed according to a

distribution in the exponential family– are affected by multi-collinearity, outliers and missing values in the data, – often troublesome to use for selecting important predictors and their

interactions. – categorical predictors with large numbers of categories (for example,

postcode, occupation code etc) can lead to unreliable results due to sparsity-related issues.

Linear models take longer to build because of the need to address the issues above by transforming both numeric and categorical predictors and choosing predictors and their interactions by hand which can prove to be a lengthy task.

Page 3: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 3PricewaterhouseCoopers

Data mining techniques in contrast, are typically fast, easily select predictors and their interactions, are minimally affected with missing values, outliers or collinearity and effectively process high-level categorical predictors.

This suggests that combining a linear approach with data mining tools can expedite the modelling process, allowing the modeller to attain equal or better model accuracy in less time with the same level of interpretability.

Such models, usually combining decision trees, multivariate adaptive regression splines and GLM have been used by our team in a number of projects (see Kolyshkina and Brookes, 2002).

Page 4: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 4PricewaterhouseCoopers

Strengths of GLM can be effectively combined with the computational power of data mining methods

The advance of the new methodologies does not mean that the proven, effective techniques such as GLM should be wholly replaced by them.

This talk:

1. Discusses how the advantages and strengths of GLM can be combined with the computational power of data mining methods presenting an example of the combining (MARS® ) and GLM approaches

2. Provides comparison of the results of this combined model to those achievable by hand-fitted GLM in terms of:

– time taken

– predictive power

– selection of predictors and their interactions

– interpretability of the model

– precision and model fit

Page 5: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

i

MARS ® methodology overview.

Page 6: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 6PricewaterhouseCoopers

MARS® can be described as

Generalization of stepwise linear regression or

Modification of the CART® method for the regression setting (ie for a continuous dependent variable)

Page 7: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 7PricewaterhouseCoopers

MARS® model - knots for continuous predictors

For each continuous predictor X MARS® forms pairs of piecewise linear functions with knots at each observed value t of the predictor:

X1 is equal to

– (X-t) if X >t

– 0 otherwise

X2 is equal to

– (t- X) if X <=t– 0 otherwise

It then builds a model using only these functions and their products. Value t is then called a “knot”

Page 8: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 8PricewaterhouseCoopers

MARS model - knots. Example.

For example, for the variable AGE such piecewise linear functions may be:

AGE1 is equal to

– (AGE-20) if AGE>20

– 0 otherwise

AGE2 is equal to

– (20-AGE) if AGE<=20

– 0 otherwise

Value 20 is then called a “knot”

Page 9: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 9PricewaterhouseCoopers

MARS model - piecewise constant functions for categorical predictors

For categorical predictors MARS considers all possible binary partitions of the categories.

Each such partition generates a pair of piecewise constant basis functions that are indicator functions for the two sets of categories.

Page 10: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 10PricewaterhouseCoopers

MARS model - basis functions

We will call these functions and their products “basis functions”.MARS model is a linear combination of basis functions:

y= a0 + a1*BF1+ a2*BF2 + a3*BF3+ …

The coefficients a0, a1, a3, … are then estimated by minimising the residual sum of squares (like in linear regression).

Page 11: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 11PricewaterhouseCoopers

MARS model building.

The MARS model is built in a two-phase process.

In the first phase, a model is grown by adding basis functions in the manner similar to forward selection process in linear regression until an overly large model is found.

In the second phase, basis functions are deleted in order of least contribution to the model in the manner similar to backward deletion process in linear regression

Page 12: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 12PricewaterhouseCoopers

Optimal MARS model selection. Degrees of Freedom penalty.

The optimal MARS model is the one with the lowest GCV (generalized cross-validation) measure. The GCV criterion actually does not involve cross validation:

C(M) is the cost-complexity measure of a model containing M basis functions. Note that C(M)=M is the usual measure used in linear regression. The MSE is calculated by dividing the sum of squared errors by N-M instead of by N.

The GCV formula enables C(M) > M; in other words, enables charging each basis function with more than one degree of freedom.

Page 13: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 13PricewaterhouseCoopers

MARS model building. Why this model strategy?

The basis functions and their products are only non-zero over a part of their range, as a result the regression surface is built up parsimoniously, non-zero coefficients are only used locally, only where they are needed (unlike polynomials that produce non-zero product everywhere).

By allowing for any arbitrary shape for the response function as well as for interactions, and by using the two-phase model selection method, MARS is capable of reliably tracking very complex data structures that often hide in high-dimensional data.

Page 14: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 14PricewaterhouseCoopers

How the Use of MARS® Can Expedite GLM Building.

Most of the shortcomings of linear models outlined above can be overcome by using MARS® output basis functions as the input for GLM.This makes sense because MARS®:

• quickly selects important predictors and their interactions and transforms numeric and categorical predictors in such a way that the resulting variables are linearly independent and easy to interpret

• Is minimally affected by multicollinearity, outliers and missing values in the data, easily handles categorical predictors with large numbers of categories and therefore requires less data preparation than linear methods

The modeller though needs to make sure that the transformed predictors make business sense, and that the MARS® model is stable.

Page 15: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 15PricewaterhouseCoopers

Are MARS® output functions OK to use as GLM inputs?

We have seen that although MARS® output functions are not created specifically to be used as the input for a linear model, in practice about 90% of them turn out to be significant predictors in a GLM.

Another feature of the MARS® output functions that makes them useful is that they are linearly independent as stated in Friedman (1991) which means that the multi-collinearity issues do not arise in the GLM that uses them as explanatory variables.

Page 16: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 16PricewaterhouseCoopers

Combined models are easily implemented in SAS

MARS® package is easily available, inexpensive and can work with data in most formats (SAS, SPSS, dbf etc).

The output MARS® produces can be combined with any GLM software with minimal effort as it is easy to code in any program language such as SAS which is the main data analysis software package used by many actuaries.

Page 17: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

i

Case Study. Application of combined models to Queensland Industry CTP data provided by Motor Accident Insurance Commission (MAIC).

Page 18: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 18PricewaterhouseCoopers

Background.

The methodology described above was applied in order to model the ultimate incurred number of claims based on reported claim data.

The data used was industry-wide auto liability data from Queensland (commonly called Compulsory Third Party or “CTP” in Australia).

Page 19: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 19PricewaterhouseCoopers

Data Description.

Individual claim data was aggregated into the number of claims reported for each accident month and development month for input to the GLM.

The variables used for the analysis were: – accident month

– accident quarter

– number of casualties

– development month

– development quarter

– number of vehicles in the calendar year

– number of vehicles exposed in the month

Page 20: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 20PricewaterhouseCoopers

Hand-fitted GLM

An initial GLM was created without using MARS®. This was a Poisson model with the log link, using the number of vehicles exposed in the month as the offset.

The transformations and interactions of the input variables were created manually for the purposes of both best model fit and interpretability.

The model fit was assessed by usual methods such as ratio of deviance to the degrees of freedom, predictor significance, linktest and residual analysis. All the assessments showed adequacy of the model fit.

Page 21: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 21PricewaterhouseCoopers

MARS®- enhanced GLM

A second model was created by pre-processing the variables in MARS® as and then including them in a GLM as the inputs:• built a MARS® model with the ratio of incurred number of claims to number of vehicles exposed in the month as the dependent variable. • we then used basis functions produced by the MARS® model as the inputs in the Poisson model with log link and the number of vehicles exposed in the month as the offset. • the GLM output showed that most of these variables were significant. We used backward elimination to refine the model by excluding the inputs that were not significant. • the resulting model fit was assessed in the same way as for the hand-fitted model.

Page 22: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 22PricewaterhouseCoopers

The model output included explanatory variables pre-processed by MARS® in the form of basis functions:

BF1 = max( 0, LGDV - 0.405);

BF2 = max( 0, 0.405 - LGDV );

BF4 = max( 0, 13.000 - DEVMTHB );

BF5 = max( 0, EXPMONTH - 51.000) * BF4;

BF6 = max( 0, 51.000 - EXPMONTH ) * BF4;

BF59 = max( 0, 16.000 - DEVMTHB ) * BF17;

Page 23: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 23PricewaterhouseCoopers

As you can see, predictors transformed by MARS can be copied and pasted from MARS to SAS directly. They are easy to interpret.

We copy and paste them into SAS and run the following SAS code to add them to the data:

data data1;

set glmcours.data;

BF1 = max(0, LGDV - 0.405);

BF2 = max(0, 0.405 - LGDV );

BF59 = max(0, 16.000 - DEVMTHB ) * BF17;

run ;

Page 24: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 24PricewaterhouseCoopers

SAS code for MARS-enhanced GLM.

PROC GENMOD DATA=DATA1;

model Claims = BF1 BF2 BF4 BF5 BF6 BF7 BF8 BF9 BF11 BF12 BF13 BF14 BF17 BF19 BF20 BF21 BF23 BF24 BF26 BF27 BF30 BF31 BF32 BF34 BF35 BF38 BF40 BF42 BF46 BF49 BF50 BF53 BF55 BF59

/ DIST=POISSON

OFFSET= LGVH

LINK=LOG

TYPE3 ;

scwgt WEIGHT;

OUTPUT OUT= residuals

STDRESDEV=strdev_CLAIMS

XBETA=xbeta_CLAIMS

PREDICTED=pred_CLAIMS;

RUN;

Page 25: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

i

Comparison of Models

Page 26: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 26PricewaterhouseCoopers

Goodness of Fit Comparison.

The fit of both GLMs was assessed by usual methods such as

• ratio of deviance to the degrees of freedom,

• predictor significance,

• link test and

• residual analysis.

Additionally we assessed gains charts for both models.

The MARS®- enhanced GLM has shown a similar if not slightly better fit to the hand-fitted GLM.

Page 27: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 27PricewaterhouseCoopers

MARS-enhanced GLM. Significance of MARS basis functions.

The GLM output showed that most of these variables were significant.

LR Statistics For Type 3 Analysis

Chi-

Source DF Square Pr > ChiSq

BF1 1 166.43 <.0001

BF2 1 167.61 <.0001

BF59 1 43.67 <.0001

We then used backward elimination to refine the model by excluding the inputs that were not significant.

Page 28: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 28PricewaterhouseCoopers

Analysis of actual versus expected values of the number of claims .

The records were ranked from highest to lowest in terms of predicted number of claims for each model, and then segmented into 20 equally sized groups. The average predicted and actual values of the number of claims for each group were then calculated and graphed.

Claim frequency. MARS-enhanced GLM

-

10,000

20,000

30,000

0.05

0.15

0.25

0.35

0.45

0.55

0.65

0.75

0.85

0.95

% of data

nu

mb

er o

f cl

aim

s

Claim frequency. Hand-fitted GLM

-

10,000

20,000

30,000

0.05 0.

20.35 0.

50.65 0.

80.95

% of data

nu

mb

er o

f cl

aim

s

Page 29: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 29PricewaterhouseCoopers

Comparison Of the Fit

The models fit equally well, with the MARS®- enhanced GLM having a marginally better fit.

The hand-fitted GLM slightly over-predicts for the fifth group and under-predicts for the third group

the MARS® - enhanced GLM predicts well for the higher expected numbers of claims but slightly overpredicts for the groups with lower numbers of claims.

the scale of these errors is of little business importance.

Page 30: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 30PricewaterhouseCoopers

Gains chart for number of incurred claims for both models.

The gains chart graph shows that the models perform equally well.

Detailed analysis of the actual tables that gains charts are based on suggests that the MARS® -enhanced GLM performs marginally better than the hand-fitted model.

Gains Chart

0

0.2

0.4

0.6

0.8

1

1.2

Hand-fittedGLMMARS GLM

base line

Page 31: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 31PricewaterhouseCoopers

MARS® -created vs hand-transformed var iables and predictor interactions: similarities and differences.

MARS®-created predictor variables and those which were manually created showed a great degree of similarity. For example, if we compare the predictors based on the variable “development month”, MARS®placed knots mostly at the same points that were found important by the hand-fitted model. The differences included the fact that the hand-fitted model included the variable “minimum (development month, 10)” which is equal to 10 if development month is greater than 10 and is equal to development month otherwise, while MARS® selected 9 rather than 10 as the “knot point”.

MARS® also selected interactions of predictors that were not picked up by the hand-fitted model such as the interaction of development month and experience month.

Page 32: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 32PricewaterhouseCoopers

Interpretability of the models.

Interpretability of the models was similar.

The hand-fitted model was easier to interpret because it included less predictors and less predictor interactions than the MARS®- enhanced model.

Page 33: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 33PricewaterhouseCoopers

Findings and results.

The fit and precision of the models was similar with the MARS®-enhanced GLM showing a slightly better fit than the hand-fitted GLM.

The MARS®-enhanced GLM included predictor interactions not picked up by the hand-fitted model. This effect would be more pronounced in raw data modelling than in modelling summarised data, as the trends observed are likely to be smoother and easier to identify as random variation cancels out.

The hand-fitted model was easier to interpret because it included less predictors and less predictor interactions than the other model.

Building of the MARS®-enhanced GLM was considerably faster and more efficient.

These findings suggest that MARS® is a useful tool to enhance and expedite GLM modelling.

Page 34: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

May 05

Page 34PricewaterhouseCoopers

Future directions.

For large data sets, our team has found that combining decision trees (CART®) with MARS® and GLM proves quite effective as described in Kolyshkina & Brookes, 2002.

Also as an additional check of fit of a GLM model, a parallel model built in MARS® can be used. The model will be built as a part of the stage described previously and will not require additional time. The model equation can be copied and pasted from MARS® to SAS or another package directly.

Comparison of the predicted values of this model to the GLM model graphically and numerically can suggest some insights and inform the choice of the best of the models.

Page 35: Inna Kolyshkina PriceWaterhouseCoopers - Salford Systemsdocs.salford-systems.com/kolyshkinaone.pdf · Inna Kolyshkina PriceWaterhouseCoopers. May 05 Page 2 ewaterhouseCoopers. t :

i

The results described above as well as a number of projects completed by our team for large insurer clients demonstrate thatcombining GLM with data mining methodologies such as MARS® can be very useful for actuarial analysis .