22 november 2018 · 2018-11-26 · machines gradient boosting machines random forests ridge ......
TRANSCRIPT
22 November 2018
1
Overview and Practical Application of Machine Learning Methods in Pricing
Lewis Maddock
22 November 2018
© 2018 Willis Towers Watson. All rights reserved.
Contacts
© 2018 Willis Towers Watson. All rights reserved.
Confidentiality Statement
This presentation has been provided for the sole and exclusive use of the members of the Swedish Actuarial Society.
This document is for discussion purposes only with other members of the actuarial society and not to be relied upon for any other purpose.
Distribution or disclosure of, or quotation from, or reference to this document to any other party, is prohibited without the prior written consent of Willis Towers Watson.
For questions or further information about the content of this presentation please use contact details shown
Lewis Maddock Director, Nordic P&CInvestment, Risk & Reinsurance
Lästmakargatan 22Stockholm, 111 44 +46 (0) 721 633 592Sweden lewis.maddock@willistowerswatson
22 November 2018
2
Introduction
© 2018 Willis Towers Watson. All rights reserved.
What I promised to do…
“We take a look at the common machine learning methods in use today and try to extract the value from the hype around this field”
What is needed to make them work?
What are the strengths and weaknesses of these methods?
Where does the role of the pricing actuary and the actuarial profession fit in?
Who’s interested in what?
© 2018 Willis Towers Watson. All rights reserved.
y
x
y = a + bx
22 November 2018
3
Agent/Broker performance evaluation
Applications of machine learning in the insurance sector
© 2018 Willis Towers Watson. All rights reserved.
Underwriting and risk
managementPricing
Asset management
Claims management
Customer management
Marketing and
Distribution
P&C to Life cross-selling
Topic modeling and large loss modeling
0
500
1000
1500
2000
2500
3000
3500
-100
-80
-60
-40
-20
0
20
40
60
Low rating factor High rating factor
exposureexcess res
Web-scraping and feature design in commercial lines
Product development
(UBI)
Fraud detection
Retention segmentation and program
design
© 2018 Willis Towers Watson. All rights reserved.
1990s 20182000s 2010s
Hyper scale parallel
computing
Distributed Big Data storage/Hadoop
NoSQLdatabases
Datavisualisation
tools
Free software environments,
analytics libraries
Machine learning
Data stream and real-time processing
supporting IoT
Integrated environments and services
GLMs
Other “Non-GLM” models
This is not new….Data enrichment
GLMs in auto risk models
Integrating cost and demand
More data enrichment
GLMs in demand models
GLM refinement & LOB expansionFew factors, simple methods
22 November 2018
4
What are these machine learning methods?
© 2018 Willis Towers Watson. All rights reserved.
EnsemblesClassifications
Trees"Earth"
K-nearest Neighbors
Elastic NetNeural
Networks
Regression Trees
Naïve Bayes
K-Means Clustering
Principal Components
AnalysisLasso
Support Vector Machines
Gradient Boosting Machines
Random Forests
Ridge Regression
How do you know if a method works?
© 2018 Willis Towers Watson. All rights reserved.
Gini
MAE Log loss
AIC
RMSE
22 November 2018
5
How do you measure value?
© 2018 Willis Towers Watson. All rights reserved.
Rank hold out observations by their fitted values (high to low)
Plot cumulative response by cumulative exposure
A better model will explain a higher proportion of the response with a lower proportion of exposure
…and will give a higher Gini coefficient (yellow area)
Gini
But…
© 2018 Willis Towers Watson. All rights reserved.
Think of a model…
Multiply it by 123
Square it
Add 74½ billion
…and you get the same Gini coefficient!
22 November 2018
6
Double lift chart
© 2018 Willis Towers Watson. All rights reserved.
0
20000
40000
60000
80000
100000
4.4%
4.9%
5.4%
5.9%
6.4%
< 88% 88% -90%
90% -92%
92% -94%
94% -96%
96% -98%
98% -100%
100% -102%
102% -104%
104% -106%
106% -108%
108% -110%
110% -112%
> 112%
Re
sp
on
se
Proposed Model / Current Model
Observed Current Model Proposed Model
Financial value estimate
© 2018 Willis Towers Watson. All rights reserved.
Old model prediction
Actual experience
New model prediction
Simple formula
Old/NewNew
premiumExpectedvolume
Actual claims
Increased profit
121% P1 V1 C1 X1
… P2 V2 C2 X2
… … … … …
… P99 V99 C99 X99
85% P100 V100 C100 X100
Valuecreated
$ X
Errors in insurance pricing are not symmetrical
Financial benefit can be estimated
Consider actual experience in out of sample datafor each percentile of old vs new model fitted values
Estimate financial benefit that would have been attained given an assumed elasticity
given business rules such as an assumed cap/floor approach
2.1% on loss ratio…
22 November 2018
7
Is there more to it…?
© 2018 Willis Towers Watson. All rights reserved.
Predictive power
Data
Factor engineering & knowing what
to model Method
Choosing a method
© 2018 Willis Towers Watson. All rights reserved.
Dimensions of utility
Analytical time and
effort
Predictive power
Execution speedTable
implementation
Interpretation
Method
Stability
22 November 2018
8
© 2018 Willis Towers Watson. All rights reserved.
Analytical time and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
GLM
Stability
Focus on Trees
EnsemblesClassifications
Trees"Earth"
K-nearest Neighbours
Elastic NetNeural
Networks
Regression Trees
Naïve Bayes
K-Means Clustering
Principal Components
AnalysisLasso
Support Vector Machines
Gradient Boosting Machines
Random Forests
Ridge Regression
© 2018 Willis Towers Watson. All rights reserved.
22 November 2018
9
Group < 15?
Age < 40?
All data
Decision Trees
© 2018 Willis Towers Watson. All rights reserved.
Group < 5?
Y N
Y NGroup
Ag
e
Y N
© 2018 Willis Towers Watson. All rights reserved.
A simple Tree example
22 November 2018
10
© 2018 Willis Towers Watson. All rights reserved.
A simple Tree example
Group < 3?
Y N
© 2018 Willis Towers Watson. All rights reserved.
A simple Tree example
Group < 3?
Y N
Group < 16?
Y N
22 November 2018
11
© 2018 Willis Towers Watson. All rights reserved.
A simple Tree example
Group < 3?
Y N
Group < 16?
Y N
Group < 18?
Y N
Trees are greedy
© 2018 Willis Towers Watson. All rights reserved.
Let’s build a simple model:
# Equations
Seminar Feedback Score by Equation Count
22 November 2018
12
Trees are greedy
© 2018 Willis Towers Watson. All rights reserved.
Let’s build a simple model:
# Equations
Seminar Feedback Score by Equation Count
Seminar Feedback Score by Equation Count
N
Trees are greedy
© 2018 Willis Towers Watson. All rights reserved.
Let’s build a simple model:
# Equations
Actuary?
#Eq. < 3?
Y N
#Eq. < 3?
Y
- +
Y N
+ -
22 November 2018
13
Trees are greedy
© 2018 Willis Towers Watson. All rights reserved.
Let’s build a simple model:
# Equations
Seminar Feedback Score by Equation Count
Trees are bad at categorical variables
© 2018 Willis Towers Watson. All rights reserved.
Seminar Feedback Score by First Initial
22 November 2018
14
Trees are bad at categorical variables
© 2018 Willis Towers Watson. All rights reserved.
First initial =B, D, E, I, J, L, M, O, Q, R, U,
V or Y?
Low
Y N
High
Seminar Feedback Score by First Initial
Trees are bad at turning points
© 2018 Willis Towers Watson. All rights reserved.
Seminar Feedback Score by Talk Length
22 November 2018
15
Trees are bad at turning points
© 2018 Willis Towers Watson. All rights reserved.
Seminar Feedback Score by Talk Length30 <=Length<= 70
High
Y N
Low
Shortcomings of using trees
© 2018 Willis Towers Watson. All rights reserved.
Summary
They may miss interactions…
… they may struggles with categorical variables….
…and they can be bad at turning points
22 November 2018
16
© 2018 Willis Towers Watson. All rights reserved.
Analyticaltime and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
Decision Trees
Stability
Some machine learning methods
© 2018 Willis Towers Watson. All rights reserved.
EnsemblesClassifications
Trees"Earth"
K-nearest Neighbors
Elastic NetNeural
Networks
Regression Trees
Naïve Bayes
K-Means Clustering
Principal Components
AnalysisLasso
Support Vector Machines
Gradient Boosting Machines
Random Forests
Ridge Regression
22 November 2018
17
Focus on Random Forests
© 2018 Willis Towers Watson. All rights reserved.
EnsemblesClassifications
Trees"Earth"
K-nearest Neighbours
Elastic NetNeural
Networks
Regression Trees
Naïve Bayes
K-Means Clustering
Principal Components
AnalysisLasso
Support Vector Machines
Gradient Boosting Machines
Random Forests
Ridge Regression
Random Forests
© 2018 Willis Towers Watson. All rights reserved.
Tree 1: Prediction 1 = Signal 1 + Noise 1
Tree 2: Prediction 2 = Signal 2 + Noise 2
Tree 3: Prediction 3 = Signal 3 + Noise 3
…
Tree 1000: Prediction 1000 = Signal 1000 + Noise 1000
Random Forest:
Prediction = AVERAGE(Tree Predictions)
= AVERAGE(Tree Signal) + AVERAGE(Tree Noise)
Average Noise 0 if the trees are independent
Independence of trees achieved by fitting each tree to:
Random subset of data (bootstrap sample)
Random subset of factors
Average Signal Underlying trend, provided trees are complex enough to represent it
This is bagging (bootstrap aggregation) – fit lots of independent models and take an average
22 November 2018
18
© 2018 Willis Towers Watson. All rights reserved.
A simple Random Forest example
© 2018 Willis Towers Watson. All rights reserved.
A simple Random Forest example
22 November 2018
19
© 2018 Willis Towers Watson. All rights reserved.
A simple Random Forest example
© 2018 Willis Towers Watson. All rights reserved.
A simple Random Forest example
22 November 2018
20
© 2018 Willis Towers Watson. All rights reserved.
A simple Random Forest example
© 2018 Willis Towers Watson. All rights reserved.
Analytical time and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
Random Forest
Stability
22 November 2018
21
Some machine learning methods
© 2018 Willis Towers Watson. All rights reserved.
EnsemblesClassifications
Trees"Earth"
K-nearest Neighbors
Elastic NetNeural
Networks
Regression Trees
Naïve Bayes
K-Means Clustering
Principal Components
AnalysisLasso
Support Vector Machines
Gradient Boosting Machines
Random Forests
Ridge Regression
Focus on Gradient Boosting Machines
© 2018 Willis Towers Watson. All rights reserved.
Classifications Trees
"Earth"
K-nearest Neighbours
Elastic NetNeural
Networks
Regression Trees
Naïve Bayes
K-Means Clustering
Principal Components
AnalysisLasso
Support Vector Machines
Gradient Boosting Machines
Random Forests
Ridge Regression
22 November 2018
22
Gradient Boosting Machine or “GBM”
A GBM
λ
© 2018 Willis Towers Watson. All rights reserved.
All DataGroup < 5?
Y N
Age < 40?
Y N
Y N
Group < 15?
A tree
λ + λ + λ + λ +
λ + λ + λ + λ +
λ + λ + λ + λ +
λ + λ + λ + λ
Four main assumptions
© 2018 Willis Towers Watson. All rights reserved.
Learning rate / “shrinkage”
Amount by which the old model predictions are varied for the next model iteration
New model = Old + (Prediction x Learning rate)
Interaction depth
Number of splits allowed on each tree (or the number of terminal nodes – 1)
N Number of trees (iterations) allowed
Bag fraction
Trees are fitted to a subset of the data (the bag fraction) on a randomized basis
Additional noise-reduction can be achieved by using a random subset of the available factors at each iteration
All DataGroup < 5?
Y N
Age < 40?
Y N
Y N
Group < 15?
22 November 2018
23
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 0
Current residuals Underlying trend Current fitted values
# factors = 1
Interaction depth = 1
Learning rate = 10%
Bag fraction = 100%
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 0
Current residuals Model trained on current residuals Underlying trend Current fitted values
22 November 2018
24
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 0
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 1
Current residuals Underlying trend Current fitted values
λ
22 November 2018
25
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 1
Current residuals Model trained on current residuals Underlying trend Current fitted values
λ
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 1
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ
22 November 2018
26
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 2
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ + λ
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 3
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ + λ + λ
22 November 2018
27
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 4
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ + λ + λ + λ
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 5
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ + λ + λ + λ + …
λ
22 November 2018
28
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 6
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ + λ + λ + λ + …
λ + λ
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 7
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ + λ + λ + λ + …
λ + λ + λ
22 November 2018
29
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 8
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ + λ + λ + λ + …
λ + λ + λ + λ
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 9
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ + λ + λ + λ + …
λ
λ + λ + λ + λ + …
22 November 2018
30
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 10
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ + λ + λ + λ + …
λ + λ
λ + λ + λ + λ + …
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 20
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ + λ + λ + λ + …
λ + λ
λ + λ + λ + λ + …
λ + λ + λ + λ + …
λ + λ + λ + λ
+ λ + λ + …
22 November 2018
31
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 30
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
λ + λ + λ + λ + …
λ + λ
λ + λ + λ + λ + …
λ + λ + λ + λ + …
λ + λ + λ + λ
+ λ + λ + …
λ + λ + λ + λ + …
λ + λ
λ + λ + λ + λ + …
+ …
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 40
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
22 November 2018
32
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 50
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 100
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
22 November 2018
33
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 200
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 300
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
22 November 2018
34
A simple GBM example
© 2018 Willis Towers Watson. All rights reserved.
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
GBM results at iteration 1,000
Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values
Calibrating the assumptions
© 2018 Willis Towers Watson. All rights reserved.
n-fold cross validation used to develop the interaction depth and learning rate assumptions Eg for 3-fold validation, split into 3, fit on purple, test on blue parts, take average
Resulting plots can be used to determine the optimal assumption choice Including how many trees to run
Fit
Fit
Test
Fit
Test
Fit
Test
Fit
Fit
1 2 3
22 November 2018
35
Example 5-fold cross validation
Best result shown by brown line as has lowest minimum validation error (interaction depth 2 and learning rate
2% in this case)
Minimum point shows optimal number of trees in
each case.
This example is based on artificial data – large
insurance datasets indicate a larger number of trees to
be optimal
© 2018 Willis Towers Watson. All rights reserved.
What does a GBM look like?
© 2018 Willis Towers Watson. All rights reserved.
22 November 2018
36
What does a GBM look like?
© 2018 Willis Towers Watson. All rights reserved.
What does a GBM look like?
© 2018 Willis Towers Watson. All rights reserved.
22 November 2018
37
© 2018 Willis Towers Watson. All rights reserved.
Does it work? How does it work?
© 2018 Willis Towers Watson. All rights reserved.
22 November 2018
38
Deploying GBMs
© 2018 Willis Towers Watson. All rights reserved.
Age ExposureBurning
CostVehicle Group
ExposureBurning
Cost
1 <=20 1,720 179 1 1-10 164,107 77
2 21-30 34,893 122 2 11-14 84,859 101
3 31-50 118,182 102 3 15-18 28,952 116
4 51+ 127,054 70 4 19-20 3,931 272
5 Age Total 281,849 91 5 VG Total 281,849 91
Gender ExposureBurning
Cost
1 Male 197,339 92
2 Female 84,510 87
3Gender
Total281,849 91
Model down into multiplicative tables via GLMs
Use insights to guide GLM
Factor Reduction
Establish Model
Hierarchy
Corner correctors and pre-baked
interactions
Deploy directly
Deploying GBMs
© 2018 Willis Towers Watson. All rights reserved.
Deploy directly
Pre / post mapping
“Comfort Diagnostics”
ModelData0
20000
40000
60000
80000
100000
4.4%
4.9%
5.4%
5.9%
6.4%
< 88% 88% -90%
90% -92%
92% -94%
94% -96%
96% -98%
98% -100%
100% -102%
102% -104%
104% -106%
106% -108%
108% -110%
110% -112%
> 112%
Re
sp
on
se
Proposed Model / Current ModelObserved Current Model Proposed Model
Deploy directly
22 November 2018
39
© 2018 Willis Towers Watson. All rights reserved.
Analytical time and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
GBMs
Stability
Implementationin modern
rating engines
Some machine learning methods
EnsemblesClassifications
Trees"Earth"
K-nearest Neighbors
Elastic NetNeural
Networks
Regression Trees
Naïve Bayes
K-Means Clustering
Principal Components
AnalysisLasso
Support Vector Machines
Gradient Boosting Machines
Random Forests
Ridge Regression
© 2018 Willis Towers Watson. All rights reserved.
22 November 2018
40
Focus on Neural Networks
© 2018 Willis Towers Watson. All rights reserved.
EnsemblesClassifications
Trees"Earth"
K-nearest Neighbours
Elastic NetNeural
Networks
Regression Trees
Naïve Bayes
K-Means Clustering
Principal Components
AnalysisLasso
Support Vector Machines
Gradient Boosting Machines
Random Forests
Ridge Regression
willistowerswatson.com
Where is the value?
Which policyholder is more likely to make a claim?
© 2018 Willis Towers Watson. All rights reserved.
22 November 2018
41
willistowerswatson.com
Where is the value?
Which picture is more likely to be of a cat?
© 2018 Willis Towers Watson. All rights reserved.
willistowerswatson.com
Where is the value?
© 2018 Willis Towers Watson. All rights reserved.
Which picture is more likely to be of a cat?
22 November 2018
42
willistowerswatson.com
Neural networks
© 2018 Willis Towers Watson. All rights reserved.
?
© 2018 Willis Towers Watson. All rights reserved.
Analytical time and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
Neural network
Stability
22 November 2018
43
Some machine learning methods
EnsemblesClassifications
Trees"Earth"
K-nearest Neighbors
Elastic NetNeural
Networks
Regression Trees
Naïve Bayes
K-Means Clustering
Principal Components
AnalysisLasso
Support Vector Machines
Gradient Boosting Machines
Random Forests
Ridge Regression
© 2018 Willis Towers Watson. All rights reserved.
Focus on Penalized Regression
EnsemblesClassifications
Trees"Earth"
K-nearest Neighbours
Elastic NetNeural
Networks
Regression Trees
Naïve Bayes
K-Means Clustering
Principal Components
AnalysisLasso
Support Vector Machines
Gradient Boosting Machines
Random Forests
Ridge Regression
© 2018 Willis Towers Watson. All rights reserved. Proprietary and Confidential. For Willis Towers Watson and Willis Towers Watson client use only.
22 November 2018
44
Penalized Regression
© 2018 Willis Towers Watson. All rights reserved.
| , f(x) = g-1(X.) where estimated by minimizing
GLM
Penalized Regression
© 2018 Willis Towers Watson. All rights reserved.
Ridge ∑ Lasso∑Elastic Net
| , f(x) = g-1(X.) where estimated by minimizing
Elastic Net
Ridge Lasso GLM
Heavily penalize large parameters, but does not reduce parameters to zero
Penalty reduces insignificant parameter values to zero - useful for variable selection
Mix of the two
22 November 2018
45
Penalized Regression
| , f(x) = g-1(X.) where estimated by minimizing
Elastic Net
Ridge Lasso GLM
Ridge ∑ Lasso∑Elastic Net
Heavily penalize large parameters, but does not reduce parameters to zero
Penalty reduces insignificant parameter values to zero - useful for variable selection
Mix of the two
© 2018 Willis Towers Watson. All rights reserved.
Deploying Penalized Regression
© 2018 Willis Towers Watson. All rights reserved.
Same as GLMs!
Age ExposureLossCost
Vehicle Group
ExposureLossCost
1 <=20 1,720 179 1 1-10 164,107 77
2 21-30 34,893 122 2 11-14 84,859 101
3 31-50 118,182 102 3 15-18 28,952 116
4 51+ 127,054 70 4 19-20 3,931 272
5 Age Total 281,849 91 5 VG Total 281,849 91
Gender ExposureLossCost
1 Male 197,339 92
2 Female 84,510 87
3Gender
Total281,849 91
22 November 2018
46
© 2018 Willis Towers Watson. All rights reserved.
Analytical time and
effort
Predictive power
TableImplementation
Interpretation
Penalized Regression
Stability
Execution speed
Analyticaltime and
effort
Predictive power
TableImplementation
Interpretation
Penalised Regression
Stability
Execution speed
A toolkit…
© 2018 Willis Towers Watson. All rights reserved.
Analyticaltime and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
Emblem
Stability
Analyticaltime and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
"Earth"
Stability
Analytical time and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
GBMs
Stability
Analytical time and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
Trees
Stability
Analytical time and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
Random Forests
Stability
Analyticaltime and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
Support Vector
Machine
Stability
Analytical time and
effort
Execution speedTable
Implementation
Interpretation
Neural Networks
Stability
Predictive power
22 November 2018
47
Conclusions
© 2018 Willis Towers Watson. All rights reserved.
Analyticaltime and
effort
Predictive power
Execution speedTable
Implementation
Interpretation
Emblem
Stability
What do you use where?
© 2018 Willis Towers Watson. All rights reserved.
%Data science Domain experts
22 November 2018
48
It’s domain expertise that helps decide
© 2018 Willis Towers Watson. All rights reserved.
Data science
Domain experts
Issues for the Profession(s)
Regulatory issues
TAS: Judgement - what judgement?
GDPR
UK Government Select Committee (Science and Technology)
Training
CAS, SOA ahead? (eg CSPA)
IFoA playing catch up
Is it too late?Role of the actuary
Domain expertise matters (at least currently)
Easier for an actuary to pick up machine learning than for a data scientist to understand insurance?
Siloed teams don’t work
Familiarity and the right vernacular can help
Scope of involvement?Pricing Reserving Claims analytics Customer management ? Marketing ???
© 2018 Willis Towers Watson. All rights reserved.
22 November 2018
49
Questions and Discussion
© 2018 Willis Towers Watson. All rights reserved.