predictive modeling spring 2005 camar meeting louise francis, fcas, maaa francis analytics and...

Predictive ModelingSpring 2005 CAMAR meeting

Louise Francis, FCAS, MAAAFrancis Analytics and Actuarial Data Mining, Inc

www.data-mines.com

2

Objectives

Introduce Predictive modeling Why use it? Describe some methods in depth

Trees Neural networks Clustering

Apply to fraud data

3

Predictive Modeling Family

Predictive Modeling

Classical Linear Models GLMs Data Mining

4

Why Predictive Modeling?

Better use of insurance data

Advanced methods for dealing with messy data now available

5

Major Kinds of Modeling

Supervised learning Most common situation A dependent variable

Frequency Loss ratio Fraud/no fraud

Some methods Regression CART Some neural networks

Unsupervised learning No dependent variable Group like records

together A group of claims with

similar characteristics might be more likely to be fraudulent

Some methods Association rules K-means clustering Kohonen neural

networks

6

Two Big Specialties in Predicative Modeling

GLMS

Regression Logistic Regressions Poisson Regression

0 125 250 375 500

90% 5% .5944 328.9876

Mean=100047.5

Distribution for Severity/B10

Val

ues

in 1

0^ -5

Values in Thousands

0.000

0.200

0.400

0.600

0.800

1.000

1.200

1.400

Mean=100047.5

0 125 250 375 500 -1 0 1 2 3 4 5 6 7

5% 90% 5% 0 3

Mean=1.001

Distribution for Claims/B3

0.000

0.200

0.400

0.600

0.800

1.000

Mean=1.001

-1 0 1 2 3 4 5 6 7

Data Mining

Trees Neural Networks Clustering

7

Modeling ProcessInternal

Data

Data Cleaning

External Data

Other Preprocessing

Build Model Validate Model Test Model

Deploy Deploy ModelModel

8

Data Complexities Affecting Insurance Data Nonlinear functions Interactions Missing Data Correlations Non normal data

9

Kinds of Applications

Classification Prediction

10

The Fraud Study Data

• 1993 Automobile Insurers Bureau closed Personal Injury Protection claims

• Dependent Variables• Suspicion Score

• Number from 0 to 10• Expert assessment of liklihood of fraud or abuse

• 5 categories• Used to create a binary indicator

• Predictor Variables• Red flag indicators • Claim file variables

11

Introduction of Two Methods

Trees Sometimes known as CART (Classification and

Regression Trees) Neural Networks

Will introduce backpropagation neural network

12

Decision TreesDecision Trees

Recursively partitions the data Often sequentially bifurcates the data – but can

split into more groups Applies goodness of fit to select best partition at

each step Selects the partition which results in largest

improvement to goodness of fit statistic

13

Goodness of Fit StatisticsGoodness of Fit Statistics Chi Square CHAID (Fish, Gallagher, Monroe- Discussion

Paper Program, 1990)

Deviance CART

2j

cases j

2 log( ) (categorical)

D= (y ) (or RSS for continuous variables)

i ik ikk

j

D n p

2

,

Observed-Expected2

Expectedi k

14

Goodness of Fit StatisticsGoodness of Fit Statistics

Gini Measure CART i is impurity measure

21 kk

i p

( , ) ( ) ( ) ( )L L R Rt s i t p i t p i t

15

Goodness of Fit StatisticsGoodness of Fit Statistics

Entropy C4.5

2 2( ) log ( ) log ( )EE

I E pN

2log ( )k kk

H p p

16

An Illustration from Fraud data: GINI Measure

Fraud/No FraudLegal Representation No Yes Total

No 626 80 706Yes 269 425 694

Total 895 505 1400Percent 64% 36%

17

First SplitFirst Split

All Claims p(fraud) = 0.36

Legal Rep = Yes

P(fraud) = 0 .612

Legal Rep = No

P(fraud) = 0.113

18

Example cont:

Root Node: 0.461199

Fraud/No FraudNo Yes 1-p(i)^2 Row %

No 0.887 0.113 0.201 50.4%Yes 0.388 0.612 0.475 49.6%

33.7%

Legal

0.337 ,201*.504 .475*.496

.461 .337 0.124improvement

19

Example of Nonlinear FunctionSuspicion Score vs. 1st Provider Bill

1000 3000 5000 7000

Provider Bill

0.00

1.00

2.00

3.00

4.00ne

tfra

ud1

Neural Network Fit of SUSPICION vs Provider Bill

20

|mp1.bill<1279.5

mp1.bill<153

mp1.bill<842.5

mp1.bill<2389

0.3387

1.2850 2.2550

3.6430 4.4270

An Approach to Nonlinear Functions: Fit A Tree

21Provider Bill

Fra

ud

Sco

re P

red

ictio

n

0 5000 10000 15000

12

34

Fitted Curve From Tree

22

Neural NetworksNeural Networks

Developed by artificial intelligence experts – but now used by statisticians also

Based on how neurons function in brain

23

Neural NetworksNeural Networks

• Fit by minimizing squared deviation between fitted and actual values

• Can be viewed as a non-parametric, non-linear regression

• Often thought of as a “black box”• Due to complexity of fitted model it is difficult

to understand relationship between dependent and predictor variables

24

The Backpropagation Neural Network

Three Layer Neural Network

Input Layer Hidden Layer Output Layer(Input Data) (Process Data) (Predicted Value)

25

Neural Network

Fits a nonlinear function at each node of each layer

0 1 10, 1 0 1 1 ( ... )

1( ; ... ) ( ... )

1 n nn n n w w x w xh f X w w w f w w x w xe

26

The Logistic Function

-1.2 -0.7 -0.2 0.3 0.8

X

0.0

0.2

0.4

0.6

0.8

1.0

Logistic Function for Various Values of w1

w1=-10w1=-5w1=-1w1=1w1=5w1=10

27

Universal Function Approximator• The backpropagation neural network with one

hidden layer is a universal function approximator

• Theoretically, with a sufficient number of nodes in the hidden layer, any continuous nonlinear function can be approximated

28

Nonlinear Function Fit by Neural Network

1000 3000 5000 7000

Provider Bill

0.00

1.00

2.00

3.00

4.00

ne

tfra

ud1

Neural Network Fit of SUSPICION vs Provider Bill

29

Interactions

Functional relationship between a predictor variable and a dependent variable depends on the value of another variable(s)

3000 8000 13000 18000

Provider Bill

0.00

2.00

4.00

6.00

0.00

2.00

4.00

6.00

0.00

2.00

4.00

6.00

Ne

ura

l N

et

Pre

dic

ted

inj.type: 01 inj.type: 02

inj.type: 03 inj.type: 04

inj.type: 05

Neural Network Predicted for Provider Bill and Injury Type

30

Interactions

Neural Networks The hidden nodes pay a key role in

modeling the interactions CART partitions the data

Partitions capture the interactions

31

|mp1.bill<1279.5

mp1.bill<153

injtype:abcefghi

injtype:abfgi

injtype:abcefgh

injtype:abcefgi

mp1.bill<2675.5

mp1.bill<2017.5

0.14 0.30

0.68 1.00 2.10

3.20

3.70 4.20

4.80

Simple Tree of Injury and Provider Bill

324000 10000 16000 4000 10000 16000

4000 10000 16000

mp1.bill

2

5

2

5

2

5

resp

onse

injtype: 1 injtype: 2 injtype: 4



33

Missing Data

Occurs frequently in insurance data There are some sophisticated methods for

addressing this (i.e., EM algorithm) CART finds surrogates for variables with missing

values Neural Networks have no explicit procedure for

missing values

34

More Complex Example

Dependent variable: Expert’s assessment of liklihood claim is legitimate A classification application

Predictor variables: Combination of claim file variables (age of claimant, legal representation) red flag variables (injury is strain/sprain only, claimant

has history of previous claim) Used an enhancement on CART known as boosting

35

Red Flag Predictor VariablesRed Flag Variables

Subject Indicator Variable Description Accident ACC01 No report by police officer at scene ACC04 Single vehicle accident ACC09 No plausible explanation for accident ACC10 Claimant in old, low valued vehicle ACC11 Rental vehicle involved in accident ACC14 Property Damage was inconsistent with accident ACC15 Very minor impact collision ACC16 Claimant vehicle stopped short ACC19 Insured felt set up, denied fault Claimant CLT02 Had a history of previous claims CLT04 Was an out of state accident CLT07 Was one of three or more claimants in vehicle Injury INJ01 Injury consisted of strain or sprain only INJ02 No objective evidence of injury INJ03 Police report showed no injury or pain INJ05 No emergency treatment was given INJ06 Non-emergency treatment was delayed INJ11 Unusual injury for auto accident Insured INS01 Had history of previous claims INS03 Readily accepted fault for accident INS06 Was difficult to contact/uncooperative INS07 Accident occurred soon after effective date Lost Wages LW01 Claimant worked for self or a family member LW03 Claimant recently started employment

36

Claim File Variables

Variable Description

AGE Age of claimant

RPTLAG Lag from date of accident to date reported

TREATLAGLag from date of accident to earliest treatment by service provider

AMBUL Ambulance charges

PARTDIS The claimant partially disabled

TOTDIS The claimant totally disabled

LEGALREP The claimant represented by an attorney

Claim Variables Available Early in Life of Claim

37

Neural Network Measure of Variable Importance

• Look at weights to hidden layer

• Compute sensitivities:• a measure of how much the predicted value’s

error increases when the variables are excluded from the model one at a time

38

Variable Importance

Rank Rank Variable Importance

1 LEGALREP 100.0 ||||||||||||||||||||||||||||||||||||||||||2 TRTLAG 69.7 |||||||||||||||||||||||||||||3 AGE 54.5 ||||||||||||||||||||||4 ACC04 44.4 ||||||||||||||||||5 INJ 01 42.1 |||||||||||||||||6 INJ 02 39.4 ||||||||||||||||7 ACC14 35.8 ||||||||||||||8 RPTLAG 32.4 |||||||||||||9 AMBUL 29.3 ||||||||||||

10 CLT02 23.9 |||||||||

39

Testing: Hold Out Part of Sample

• Fit model on 1/2 to 2/3 of data

• Test fit of model on remaining data

• Need a large sample

40

Testing: Cross-Validation

• Hold out 1/n (say 1/10) of data• Fit model to remaining data• Test on portion of sample held out• Do this n (say 10) times and average the

results• Used for moderate sample sizes• Jacknifing similar to cross-validation

41

Results of Classification on Test Data

Fitted TreeActual 0 1

0 77.3% 22.7%1 14.3% 85.7%

Fitted Neural NetworkActual 0 1

0 81.5% 18.5%1 26.7% 73.3%

42

Unsupervised LearningUnsupervised Learning

Common Method: Clustering No dependent variable – records are grouped

into classes with similar values on the variable

Start with a measure of similarity or dissimilarity

Maximize dissimilarity between members of different clusters

43

Dissimilarity (Distance) Dissimilarity (Distance) MeasureMeasure Euclidian Distance

Manhattan Distance

1/ 22

1( ) i, j = records k=variable

mij ik jkkd x x

1

mij ik jkkd x x

44

Binary Variables

Row Variable1 0

1 a b a+b0 c d c+d

a+c b+dCo

lum

n

Var

iab

le

45

Binary Variables

Sample Matching

Rogers and Tanimoto

b cd

a b c d

2( )( ) 2( )

b cd

a d b c

46

Results for 2 Clusters

Cluster Lawyer Back Claim Or Sprain Chiro or PT Prior Claim1 77% 73% 56% 26%2 3% 29% 14% 1%

AverageSuspicious Suspicion

Cluster Claim Score

1 56% 2.992 3% 0.21

47

Beginners Library

Berry, Michael J. A., and Linoff, Gordon, Data Mining Techniques, John Wiley and Sons, 1997

Kaufman, Leonard and Rousseeuw, Peter, Finding Groups in Data, John Wiley and Sons, 1990

Smith, Murry, Neural Networks for Statistical Modeling, International Thompson Computer Press, 1996

Data Mining

CAMAR Spring Meeting

Louise Francis, FCAS, [email protected]

www.data-mines.com

predictive modeling spring 2005 camar meeting louise francis, fcas, maaa francis analytics and...

Documents

messy data

groupsapplies goodness

actuarial data mining

predictive modelingspring

best partition

splitall claims pfraud

maaafrancis analytics

gini measuresheet1legalrep