addressing analytics challenges in the retail and insurance … · 2011-03-21 · addressing...
TRANSCRIPT
![Page 1: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/1.jpg)
Addressing Analytics Challenges in
the Insurance Industry
Noe Tuason
California State Automobile
Association
![Page 2: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/2.jpg)
Overview
• Two Challenges:
1. Identifying High/Medium Profit who are High/Low Risk of Flight Prospects in the company’s Internal Customers’ Database
2. Finding New Factors to Improve Pricing Model
• Methodologies are applicable to financial, retail, and other industries
![Page 3: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/3.jpg)
Identifying High Profit who are Low/High Risk of
Flight Prospects in our Customers’ Database
Challenge 1
![Page 4: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/4.jpg)
Segmenting High Profit and Low Risk of Flight Customers
Profitability (Loss Ratio Score)
High Medium Low
Risk
of
Flight
Low
High
1
High Profit
Stable
4
High Profit
Likely to Leave
2
Medium Profit
Stable
5
Medium Profit
Likely to Leave
3
Low Profit
Stable
6
Low Profit
Likely to Leave
Methodology for determining Risk of Flight
Logistic Regression Using Insurance Customers Data
![Page 5: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/5.jpg)
Challenge: Identify and Differentiate the Stable-High/Medium Profit as
well as the Likely to Leave-High/Medium Profit Customers from the Low
Profit Customers in the Prospect Database
Profitability (Loss Ratio Score)
High Medium Low
Risk
of
Flight
Low
High
1
High Profit
Stable
4
High Profit
Likely to Leave
2
Medium Profit
Stable
5
Medium Profit
Likely to Leave
3
Low Profit
Stable
6
Low Profit
Likely to Leave
![Page 6: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/6.jpg)
Paradigm for Targeting High/Medium Profit and Low/High Risk of Flight Prospects in the Members’ Database
Insurance Customers (Model)
Members Database (Score)
Insurance
Customer
Segments
Membership Variables M1 M2 . . Mn
Demographics P1 P2 . . Pn
Membership
Variables
M1
M2
.
.
Mn
Demographics
P1
P2
.
.
Pn
For prospecting in
external databases
![Page 7: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/7.jpg)
Differentiating Between the 3 Groups Within the Non-Insureds in the
Prospect Database (AAA Members) Using CART
• Demographics
• Lifestage
• MembershipVariables
• Transaction Variables
Draw a sample of
10,000 insureds with
segments and
appended the following
variables for modeling: Run CART
1
High Profit
Stable
4
High Profit
Likely to Leave
2
Medium Profit
Stable
5
Medium Profit
Likely to Leave
3
Low Profit
Stable
6
Low Profit
Likely to Leave
![Page 8: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/8.jpg)
Decision to use CART over Multinomial Logit or
Discriminant Analysis
• Handles discrete or continuous target variable
• No worries about linearity or normality assumptions
• Can handle categorical predictors without need to create dummy variables
• Could use missing values as valid category—no need to do imputation
• Gives surrogate and competitive variables—another way of handling missing values
• Automatic.
• Allows for overgrowing and pruning back. Recommends best tree
• Shows hierarchical interactions and impact of these interactions
• Gives Relative importance of variables
• Includes self-validation to avoid overit: holdout and n-ways cross validation
• Alternative splitting criteria depending on structure of data
• Can specify higher penalty for misclassification, e.g. misclassifying low risks cases
CART is an acronym for Classification and Regression Trees, a decision-tree
procedure introduced in 1984 by world-renowned UC Berkeley and Stanford
statisticians,Leo Breiman, Jerome Friedman, Richard Olshen, and Charles
Stone.
![Page 9: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/9.jpg)
![Page 10: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/10.jpg)
Variables’ Relative Importance
Variable
SAMP_AGE$ 100.00 ||||||||||||||||||||||||||||||||||||||||||
LIFETIME_ERS_COUNT$ 53.38 ||||||||||||||||||||||
WEALTH$ 41.58 |||||||||||||||||
ETHNCITY$ 35.03 ||||||||||||||
INCOME_BRACKET$ 34.22 ||||||||||||||
LIFESTAGE$ 34.03 ||||||||||||||
LENGTH_RESIDENCE$ 33.30 |||||||||||||
MBS_STATUS$ 21.50 ||||||||
EDUCATION$ 19.86 ||||||||
GENDER$ 14.49 |||||
MARITAL$ 6.65 ||
MBS_PROGRAM$ 4.64 |
HAS_KIDS$ 2.99
![Page 11: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/11.jpg)
Actual
Class
Total
Cases
Percent
Correct
1
N=334
2
N=344
3
N=105
Stable H/M
Profits 400 69% 275 98 27
High Risk H/M
Profits 317 75% 58 239 20
Low Profit 66 88% 1 7 58
Total: 783
Average: 77%
Overall % Correct: 73%
Predicted
% Correct Classification (test-holdout validation)
![Page 12: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/12.jpg)
Finding New Factors to Optimize Pricing
Challenge 2
![Page 13: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/13.jpg)
Modeling Problem:*
• Insurance Pricing Models have different
distributional assumptions, i.e. Poisson,
Gamma, Lognormal, Negative Binomial ,
Tweddie, etc.
• Goal is to find one or two factors from over 200
geo-demographic variables that could be
included in the company’s pricing model that
could improve pricing (lower premium without
loss of profit)
*Done for another client, not AAA
![Page 14: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/14.jpg)
Procedures Used:
• SAS PROC VARCLUS (Variable Clustering)
• CART (Initial Variable Selection)
• MARS (Variable Selection, Creation of Functions to
enter into the model)
• SAS PROC GENMOD (Poisson and Gamma
Distribution)
![Page 15: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/15.jpg)
Role that MARS played in my models:
• Multivariate adaptive regression splines (MARS) is a form of
regression analysis introduced by Jerome Friedman in 1991. It is a
non-parametric regression technique and can be seen as an
extension of linear models that automatically models non-
linearities and interactions.
• Accounted for non-linear relationships by creating (basis) functions
for splines (or departures from straight line).
• Handled missing values through a process similar to CART
surrogate splitsby identifying alternative basis functions
• Like CART it initially overfits model then prunes away components
that do not hold in the validation process.
• Entered the (basis) functions as predictors in PROC GENMOD
![Page 16: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/16.jpg)
Screenshot of plots to illustrate departures from linearity assumptions. They are not
accounted for by classical modeling approaches and highlights the importance of
CART/MARS steps in modeling process flow.
![Page 17: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/17.jpg)
Main Modeling Steps:
• Appended over 200 census-based variables to a
sample of over 100,000 from the insurance database
and kept claims frequencies and premium/loss
information to compute target variables.
• Clustered variables (using SAS PROC VARCLUS) to
explore data structure-reduced number of variables to
90
• Ran dataset through CART (Exploratory Regression
Tree) to find relative importance of potential predictors,
check surrogates and competitive variables-noted
variable importance. Target variables (separately)
were Claims Counts and Severity (loss/claim) in dollars
(both continuous)
![Page 18: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/18.jpg)
Main Modeling Steps (cont):
• Ran dataset with 90 variables through MARS, compared
to CART results-selected final set of variables that CART
and MARS ranked as important—reduced to 15
variables
• Ran MARS on 15 variables-obtain (Basis) Functions
• Built models using SAS PROC GENMOD using Claims
Frequency and Severity (loss/claim) with different
distributional assumptions as Targets and MARS (Basis)
Functions as predictors
• Validated models in a holdout samples: final models had
10-15 variables
• Pricing group tested variables with existing factors
![Page 19: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/19.jpg)
Sample Results: Severity Model (Gamma Dist, Log Link)
Predicted and Actual Losses
1 2 3 4 5 6 7 8 9 10
D E C I L E S
Actual Loss
Predicted Loss
![Page 20: Addressing Analytics Challenges in the Retail and Insurance … · 2011-03-21 · Addressing Analytics Challenges in the Insurance Industry Noe Tuason California State Automobile](https://reader036.vdocument.in/reader036/viewer/2022080723/5f7be3b363bc2204155ce013/html5/thumbnails/20.jpg)
You can use the approach for any linear
modeling including Multiple regression or
Logistic Regression which are really part
of the Family of Linear Models.