data mining applications in p&c insurance

46
1 Data Mining Data Mining Applications in P&C Applications in P&C Insurance Insurance CASE Spring Meeting CASE Spring Meeting April 12, 2005 April 12, 2005 Lijia Guo, PhD, ASA, MAAA Lijia Guo, PhD, ASA, MAAA University of Central Florida University of Central Florida

Upload: brit

Post on 02-Feb-2016

58 views

Category:

Documents


1 download

DESCRIPTION

Data Mining Applications in P&C Insurance. CASE Spring Meeting April 12, 2005 Lijia Guo, PhD, ASA, MAAA University of Central Florida. Agenda. Introductions to data mining modeling Understanding the data mining process Data mining (DM) techniques Applications in P&C Insurance Case Study. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Mining Applications in P&C Insurance

11

Data Mining Applications Data Mining Applications in P&C Insurancein P&C Insurance

CASE Spring MeetingCASE Spring Meeting

April 12, 2005April 12, 2005

Lijia Guo, PhD, ASA, MAAALijia Guo, PhD, ASA, MAAA

University of Central FloridaUniversity of Central Florida

Page 2: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 22

AgendaAgenda

Introductions to data mining modelingIntroductions to data mining modeling Understanding the data mining processUnderstanding the data mining process Data mining (DM) techniquesData mining (DM) techniques Applications in P&C InsuranceApplications in P&C Insurance Case StudyCase Study

Page 3: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 33

Introduction – What is Data Mining?Introduction – What is Data Mining?

Process of exploration and analysis of large Process of exploration and analysis of large quantities of data in order to discover meaningful quantities of data in order to discover meaningful patterns and rules.patterns and rules.

Uses a variety of data analysis tools to discover Uses a variety of data analysis tools to discover relationships that may be used to make valid relationships that may be used to make valid predictions.predictions.

It is not a magic wand:It is not a magic wand: Must know your businessMust know your business Understand your dataUnderstand your data Understand the analytical methods Understand the analytical methods

Page 4: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 44

Introduction - DM ModelingIntroduction - DM Modeling

An information discovery process.An information discovery process. Knowing your goalsKnowing your goals Understanding your dataUnderstanding your data Choosing the right methodsChoosing the right methods Understanding the limitations Understanding the limitations Validation and testingValidation and testing Make crucial business decisionsMake crucial business decisions

Page 5: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 55

Transform Data

Apply DM Models

Validate DM Models IMPLEMENT

Define the GoalIdentify Data

SourcesUnderstand the

Economics

Prepare Data

Introduction – DM ProcessIntroduction – DM Process

Page 6: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 66

Introduction – DM GoalsIntroduction – DM Goals

Identifying responsive potential customersIdentifying responsive potential customers Identifying existing customers that more Identifying existing customers that more

likely to terminatelikely to terminate Identifying high risk purchaserIdentifying high risk purchaser Identifying the factors that cause large Identifying the factors that cause large

claims claims Identifying interactions among risk factorsIdentifying interactions among risk factors

Page 7: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 77

Introduction – DM ProcessIntroduction – DM Process

Page 8: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 88

DM TechniquesDM Techniques

Decision Trees Decision Trees Logistic regressionLogistic regression Neural NetworksNeural Networks Fuzzy LogicsFuzzy Logics Genetic AlgorithmsGenetic Algorithms

ClusteringClustering Associated discoveryAssociated discovery Sequence DiscoverySequence Discovery Bayesian analysisBayesian analysis Visualization Visualization

Hybrid algorithmsHybrid algorithms

Page 9: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 99

DM Techniques -- Decision TreesDM Techniques -- Decision Trees

What are decision treesWhat are decision trees Classify observations based on the values of Classify observations based on the values of

nominal, binary, or ordinal targetsnominal, binary, or ordinal targets Predict outcomes for interval targets Predict outcomes for interval targets Predict the appropriate decision when you Predict the appropriate decision when you

specify decision alternatives specify decision alternatives

Page 10: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 1010

DM Techniques -- Decision Trees DM Techniques -- Decision Trees ExampleExample

Classification Of Surrender Risk

Income >$50,000Yes Or No

Job >5 YearsYes or No

High DebtYes or No

NoYes

If yes low riskElse high risk

If yes low riskelse high risk

Page 11: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 1111

DM Techniques -- Decision TreesDM Techniques -- Decision Trees

Strengths and weaknessesStrengths and weaknesses Insights into the decision-making process Insights into the decision-making process Efficient and is thus suitable for large Efficient and is thus suitable for large

data sets data sets Relatively unstableRelatively unstable Difficult to detect linear or quadratic Difficult to detect linear or quadratic

relationshipsrelationships

Page 12: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 1212

DM Techniques DM Techniques -- Logistic regression-- Logistic regression

What is Logistic regression What is Logistic regression How Logistic regression worksHow Logistic regression works

Odds ratiosOdds ratios Each dependent variable affects logit linearlyEach dependent variable affects logit linearly

.,,2,1,1

loglogit1

0 niwherexp

p k

jjij

i

i

Page 13: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 1313

Strengths and weaknessesStrengths and weaknesses Maximum Likelihood Curve FittingMaximum Likelihood Curve Fitting Multiple Logistic Regression ModelMultiple Logistic Regression Model Interaction-effect modifierInteraction-effect modifier Multinomial Logistic Regression ModelMultinomial Logistic Regression Model

DM Techniques - Logistic RegressionDM Techniques - Logistic Regression

Page 14: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 1414

DM Techniques DM Techniques -- Neural Networks-- Neural Networks

What are Neural NetworksWhat are Neural Networks

network architecture with two hidden layers

1x

2x

3x

1H

2H

y

21w

22w

11w

21w

31w

32w

1w

2w

Input layer - a Input layer - a unit for each unit for each input variableinput variable

Output layer - Output layer - the targetthe target

Hidden layer - Hidden layer - hidden unit hidden unit (neurons) (neurons)

Page 15: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 1515

: : output activation functionoutput activation function. . : : activation functionsactivation functions-nonlinear -nonlinear

transformations.transformations. : : weightsweights : : BiasBias

10 0 1 1 2 2

1 1 01 11 1 21 2 31 3

2 2 02 12 1 22 2 32 3

( )

( )

( )

g E y w w H w H

H g w w x w x w x

H g w w x w x w x

0 ( )g ( )ig

11 21 32 1 2, , , , ,w w w w w

0 01 02, ,w w w

DM Techniques – Neural NetworksDM Techniques – Neural Networks

Page 16: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 1616

How Neural Networks workHow Neural Networks work Processing elementsProcessing elements TrainingTraining PredictingPredicting Activation FunctionsActivation Functions

• logistic function logistic function

• hyperbolic tangenthyperbolic tangent

1( )

1l

e

tanh( )x x

x x

e ex

e e

DM Techniques –Neural NetworksDM Techniques –Neural Networks

Page 17: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 1717

Strengths and weaknessesStrengths and weaknesses• Accurately prediction for complex problemsAccurately prediction for complex problems• Black box predict engineBlack box predict engine• OvertrainingOvertraining• Training speedTraining speed

DM Techniques -- Neural NetworksDM Techniques -- Neural Networks

Page 18: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 1818

DM Techniques -- Hybrid AlgorithmsDM Techniques -- Hybrid Algorithms

Problems with standard algorithmsProblems with standard algorithms Advanced algorithmsAdvanced algorithms Discovery-driven approachesDiscovery-driven approaches Mixture of algorithmsMixture of algorithms

Page 19: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 1919

DM Applications in P&C InsuranceDM Applications in P&C Insurance

Data WarehouseData Warehouse UnderwritingUnderwriting Pricing/Rate MakingPricing/Rate Making Claim ScoringClaim Scoring Risk ManagementRisk Management Policy Level AnalysisPolicy Level Analysis Variable SelectionVariable Selection

Page 20: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 2020

Data Warehousing ExampleData Warehousing Example

HospitalHospitalClaimsClaims

HospitalHospitalClaimsClaims

PharmacyPharmacyClaimsClaims

PharmacyPharmacyClaimsClaims

PhysicianPhysicianClaimsClaims

PhysicianPhysicianClaimsClaims

Op

era

tion

al D

at a S

tore

Tertiary Selection: WHAT DOES THE TRANSACTION

DATA TELL US?

Derived Variables/Flags

Rx

Med Claims

Surveys ...

Service Level Table

Group by Patient

Summary:WHAT DO WE KNOW ABOUT THIS PATIENT?

Summary LevelVariables

Service LevelVariables

Summary Level Table

Primary Selection:WHO?

Transactions SurveysDemographics

Unique Patient List Transactions Surveys Demographics

Secondary Selection: WHAT DATA?

Page 21: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 2121

DM in Insurance UnderwritingDM in Insurance Underwriting

Improving profit margin.Improving profit margin. Gaining competitive edgeGaining competitive edge Risk evaluation process.Risk evaluation process.

Lots of variablesLots of variables Lots of interactionsLots of interactions

Easy to follow procedure.Easy to follow procedure. Decision tree can be usedDecision tree can be used

Page 22: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 2222

DM in Insurance UnderwritingDM in Insurance Underwriting- - Auto Driver’s Claim InformationAuto Driver’s Claim Information

VariableVariable Variable TypeVariable Type Measurement LevelMeasurement Level DescriptionDescription

AgeAge ContinuousContinuous IntervalInterval Driver’s age in yearsDriver’s age in years

Car ageCar age ContinuousContinuous IntervalInterval Age of the carAge of the car

Car typeCar type CategoricalCategorical NominalNominal Type of the carType of the car

GenderGender CategoricalCategorical BinaryBinary F=female, M=maleF=female, M=male

Coverage level Coverage level CategoricalCategorical NominalNominal Policy coveragePolicy coverage

EducationEducation CategoricalCategorical NominalNominal Education level of the driveEducation level of the drive

LocationLocation CategoricalCategorical NominalNominal Location of residenceLocation of residence

ClimateClimate CategoricalCategorical NominalNominal Climate code for residenceClimate code for residence

Credit ratingCredit rating ContinuousContinuous IntervalInterval Credit score of the driverCredit score of the driver

IDID InputInput NominalNominal Driver’s identification numberDriver’s identification number

No. of claimsNo. of claims CategoricalCategorical NominalNominal Number of claimsNumber of claims

Page 23: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 2323

DM in Insurance UnderwritingDM in Insurance Underwriting- Decision - Decision Tree DiagramTree Diagram

Page 24: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 2424

DM in Pricing/Rate MakingDM in Pricing/Rate Making

Data: Data: Auto Driver’s Claim InformationAuto Driver’s Claim Information Decision trees analysis to identify risk Decision trees analysis to identify risk

factors that predict profits, claims and factors that predict profits, claims and losseslosses

Logistic regression applied to modelLogistic regression applied to model Claim frequencyClaim frequency Effect of each risk factor Effect of each risk factor

Page 25: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 2525

DM in Pricing/Rate MakingDM in Pricing/Rate Making

Effect T-scores from the logistic regression

Page 26: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 2626

DM in Pricing/Rate MakingDM in Pricing/Rate Making- - Assessment Assessment

AssessmentAssessment Cross-model comparisonsCross-model comparisons of of the expected to actual the expected to actual

profits/lossesprofits/losses Independent of all other factors (sample size,..)Independent of all other factors (sample size,..)

Lift chartsLift charts % claim-occurrence value to a random baseline % claim-occurrence value to a random baseline

modelmodel Performance quality demonstrated by the degree the Performance quality demonstrated by the degree the

lift chart curve pushes upward and to the leftlift chart curve pushes upward and to the left

Page 27: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 2727

DM in Pricing/Rate MakingDM in Pricing/Rate Making- - Lift Chart for Logistic Lift Chart for Logistic

RegressionRegression

logistic Regression - Captured 30% of the drivers in the 10th percentile- Better predictive power from about the 20th to the 80th percentiles

Page 28: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 2828

DM in Risk ManagementDM in Risk Management

ReinsuranceReinsurance To structure more effectively by segmentationTo structure more effectively by segmentation

HedgingHedging Target Target retention and building loyalty retention and building loyalty

Page 29: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 2929

DM in Policy Level AnalysisDM in Policy Level Analysis

Retention analysisRetention analysis Profitability analysis Profitability analysis Policyholder’s behavior Policyholder’s behavior DM methods used DM methods used

Neural networksNeural networks Decision treesDecision trees Logistic regression Logistic regression

Page 30: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 3030

Applications – Variable SelectionApplications – Variable Selection

Problem Problem

-- -- Given {Y,X} whereGiven {Y,X} where Find F, such that Find F, such that Find and F*, such thatFind and F*, such that

Improving model accuracy and efficiencyImproving model accuracy and efficiency Making crucial business decisionsMaking crucial business decisions

,Z X( )F X Y

*( )F X Y

1 2{ , ,... }NX x x x

Page 31: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 3131

Case Study - Group InsuranceCase Study - Group Insurance

Identify ways to build upon the current Identify ways to build upon the current

manual rating structure utilizing exiting rating manual rating structure utilizing exiting rating variables to develop a practical tool to guild variables to develop a practical tool to guild underwriting in rates adjustmentsunderwriting in rates adjustments

Identify any new rating variables with Identify any new rating variables with significant predictive powersignificant predictive power Currently gathered, but not utilized dataCurrently gathered, but not utilized data Transformations of existing variablesTransformations of existing variables introduce new rating variables (e.g. external financial introduce new rating variables (e.g. external financial

data)data)

Page 32: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 3232

Case Study – Group InsuranceCase Study – Group Insurance

Profit margin over x year periodProfit margin over x year period 128 input variables128 input variables Principle Components Analysis applied Principle Components Analysis applied 42 variables remains42 variables remains How to improve business profit?How to improve business profit?

Page 33: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 3333

Case StudyCase Study - - Goals Goals

Developing a practical Developing a practical underwriting toolunderwriting tool Detecting deviationsDetecting deviations Identifying key driversIdentifying key drivers

Improving model predictive powerImproving model predictive power Risk selectionRisk selection

Page 34: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 3434

Function ApproximationFunction Approximation

0 1 1 2 2( ) ( ) ( ) ... ( )M MF X F T X T X T X

is the initial guessis the initial guess Stegewise approximationStegewise approximation Each stage added by reducing errors Each stage added by reducing errors Each stage is weak linear – a small tree.Each stage is weak linear – a small tree. Sequential adjustmentSequential adjustment

0F

Page 35: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 3535

Regression Tree ExampleRegression Tree Example

Profit=6.5%Profit=6.5%+0.8% , if AS > 421+0.8% , if AS > 421

-0.5% , otherwise-0.5% , otherwise

+1.2% , if male +1.2% , if male young than 30young than 30

-1.1% , otherwise-1.1% , otherwise

Page 36: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 3636

Function ApproximationFunction Approximation

GIVENGIVEN Y: Output and Y: Output and X: X: Inputs or PredictorsInputs or Predictors L(Y, F): Loss FunctionL(Y, F): Loss Function

ESTIMATEESTIMATE

( ) ,*( ) arg min [ ( , ( ))]F X Y XF X E L Y F X

Page 37: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 3737

Classical Function ApproximationClassical Function Approximation

Solve from Solve from

( , ), { }jF F X

{ }j

min ( , ( , ))L Y F X B

Page 38: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 3838

Nonparametric Function Nonparametric Function ApproximationApproximation

Compute Compute

0{ ( )}iF X

1( )

N

i i

Lg

F X

��������������

Initial guessInitial guess

Take a step in the steepest descent directionTake a step in the steepest descent direction

Page 39: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 3939

Gradient BoostingGradient Boosting

Initial guess Initial guess

FOR m = 1 TO MFOR m = 1 TO M

Fit an Fit an L-node regression treeL-node regression tree to the current residuals to the current residuals For each given node, calculate node average residual For each given node, calculate node average residual

Update: Update: ENDEND

2

1

1({ ( )}) ( ( ))

N

i i ii

L F X Y F XN

0{ ( )}iF X

1( ( ))m m ig L F X

1{ ( )} { ( )} ( )m i m i m iF X F X h X

( )m ih X

Page 40: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 4040

Case StudyCase Study

Tw o Predictor Dependence ForPROFIT_MARGIN

-0.10

-0.05

0.00

0.05

2000 4000 6000 8000 10000 12000 14000

Par

tial D

epen

denc

e

AVG_SALARY

Tw o Variable Dependence for PROFIT_MARGIN; Slice REGION = 1.0810810553721892PROFIT_MARGIN

-0.05

0.00

0.05

2000 4000 6000 8000 10000 12000 14000

Par

tial D

epen

denc

e

AVG_SALARY

Tw o Variable Dependence for PROFIT_MARGIN; Slice REGION = 0.99999997554681241PROFIT_MARGIN

Page 41: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 4141

Case StudyCase Study

Tw o Predictor Dependence ForPROFIT_MARGIN

-0.10

-0.05

0.00

0.05

5000 10000 15000 20000 25000

Par

tial D

epen

denc

e AVG_SALARY

Tw o Variable Dependence for PROFIT_MARGIN; Slice SIZE = 0.9999999902187251PROFIT_MARGIN

-0.05

0.00

0.05

5000 10000 15000 20000 25000

Par

tial D

epen

denc

e

AVG_SALARY

Tw o Variable Dependence for PROFIT_MARGIN; Slice SIZE = 1.0270270151969716PROFIT_MARGIN

Page 42: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 4242

Case StudyCase Study- - Single Stats and Variable ImportanceSingle Stats and Variable Importance

Input Additive Multiplicative Importance Variable 1 0.2679 0.2690 100.00 Variable 2 0.2779 0.3203 75.23 Variable 3 0.1456 0.1771 54.65 Variable 4 0.2263 0.2469 47.41 Variable 5 0.1059 0.1425 42.81 Variable 6 0.2741 0.2847 34.81 Variable 7 0.1289 0.1306 34.27 Variable 8 0.0797 0.0864 25.35 Variable 9 0.1129 0.1148 23.37

Page 43: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 4343

Case StudyCase Study- - Pair Stats and Variable ImportancePair Stats and Variable Importance

Variables Additive MultiplicativeVariable 1 & Variable 2 0.3714 0.3847

Variable 2 & Variable 3 0.3704 0.4066

Variable 2 & Variable 4 0.3686 0.4010

Variable 2 & Variable 7 0.3401 0.3856

Variable 3 & Variable 4 0.2795 0.3137

Variable 3 & Variable 6 0.2895 0.3082

Variable 4 & Variable 7 0.2417 0.2592

Variable 5 & Variable 6 0.2622 0.2766

Variable 6 & Variable 7 0.2904 0.3066

Page 44: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 4444

Predictive ModelingPredictive Modeling

Predicts deviations from expected Predicts deviations from expected profitability (used 9 variables) profitability (used 9 variables)

Practical guide for underwriters to use for Practical guide for underwriters to use for rates adjustmentsrates adjustments

New variables Identified to have strong New variables Identified to have strong predictive powerpredictive power

Improve business profit (20% Profit margin)Improve business profit (20% Profit margin)

Page 45: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 4545

Importance of Multiple Importance of Multiple TechniquesTechniques

Robust model with high predictive Robust model with high predictive accuracyaccuracy

Practical constrainsPractical constrains Algorithm complexityAlgorithm complexity Ease of understanding of resultsEase of understanding of results

Page 46: Data Mining Applications in P&C Insurance

April 12, 2005April 12, 2005 GuoGuo 4646

Is Data Mining for you?Is Data Mining for you?

Defining the goalsDefining the goals Understanding your dataUnderstanding your data Using multiple techniquesUsing multiple techniques Improving your decision making Improving your decision making

processprocess Gaining competitive edges!Gaining competitive edges!

Thank you!Thank you!