22 november 2018 · 2018-11-26 · machines gradient boosting machines random forests ridge ......

49
22 November 2018 1 Overview and Practical Application of Machine Learning Methods in Pricing Lewis Maddock 22 November 2018 © 2018 Willis Towers Watson. All rights reserved. Contacts © 2018 Willis Towers Watson. All rights reserved. Confidentiality Statement This presentation has been provided for the sole and exclusive use of the members of the Swedish Actuarial Society. This document is for discussion purposes only with other members of the actuarial society and not to be relied upon for any other purpose. Distribution or disclosure of, or quotation from, or reference to this document to any other party, is prohibited without the prior written consent of Willis Towers Watson. For questions or further information about the content of this presentation please use contact details shown Lewis Maddock Director, Nordic P&C Investment, Risk & Reinsurance Lästmakargatan 22 Stockholm, 111 44 +46 (0) 721 633 592 Sweden lewis.maddock@willistowerswatson

Upload: others

Post on 01-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

1

Overview and Practical Application of Machine Learning Methods in Pricing

Lewis Maddock

22 November 2018

© 2018 Willis Towers Watson. All rights reserved.

Contacts

© 2018 Willis Towers Watson. All rights reserved.

Confidentiality Statement

This presentation has been provided for the sole and exclusive use of the members of the Swedish Actuarial Society.

This document is for discussion purposes only with other members of the actuarial society and not to be relied upon for any other purpose.

Distribution or disclosure of, or quotation from, or reference to this document to any other party, is prohibited without the prior written consent of Willis Towers Watson.

For questions or further information about the content of this presentation please use contact details shown

Lewis Maddock Director, Nordic P&CInvestment, Risk & Reinsurance

Lästmakargatan 22Stockholm, 111 44 +46 (0) 721 633 592Sweden lewis.maddock@willistowerswatson

Page 2: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

2

Introduction

© 2018 Willis Towers Watson. All rights reserved.

What I promised to do…

“We take a look at the common machine learning methods in use today and try to extract the value from the hype around this field”

What is needed to make them work?

What are the strengths and weaknesses of these methods?

Where does the role of the pricing actuary and the actuarial profession fit in?

Who’s interested in what?

© 2018 Willis Towers Watson. All rights reserved.

y

x

y = a + bx

Page 3: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

3

Agent/Broker performance evaluation

Applications of machine learning in the insurance sector

© 2018 Willis Towers Watson. All rights reserved.

Underwriting and risk

managementPricing

Asset management

Claims management

Customer management

Marketing and

Distribution

P&C to Life cross-selling

Topic modeling and large loss modeling

0

500

1000

1500

2000

2500

3000

3500

-100

-80

-60

-40

-20

0

20

40

60

Low rating factor High rating factor

exposureexcess res

Web-scraping and feature design in commercial lines

Product development

(UBI)

Fraud detection

Retention segmentation and program

design

© 2018 Willis Towers Watson. All rights reserved.

1990s 20182000s 2010s

Hyper scale parallel

computing

Distributed Big Data storage/Hadoop

NoSQLdatabases

Datavisualisation

tools

Free software environments,

analytics libraries

Machine learning

Data stream and real-time processing

supporting IoT

Integrated environments and services

GLMs

Other “Non-GLM” models

This is not new….Data enrichment

GLMs in auto risk models

Integrating cost and demand

More data enrichment

GLMs in demand models

GLM refinement & LOB expansionFew factors, simple methods

Page 4: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

4

What are these machine learning methods?

© 2018 Willis Towers Watson. All rights reserved.

EnsemblesClassifications

Trees"Earth"

K-nearest Neighbors

Elastic NetNeural

Networks

Regression Trees

Naïve Bayes

K-Means Clustering

Principal Components

AnalysisLasso

Support Vector Machines

Gradient Boosting Machines

Random Forests

Ridge Regression

How do you know if a method works?

© 2018 Willis Towers Watson. All rights reserved.

Gini

MAE Log loss

AIC

RMSE

Page 5: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

5

How do you measure value?

© 2018 Willis Towers Watson. All rights reserved.

Rank hold out observations by their fitted values (high to low)

Plot cumulative response by cumulative exposure

A better model will explain a higher proportion of the response with a lower proportion of exposure

…and will give a higher Gini coefficient (yellow area)

Gini

But…

© 2018 Willis Towers Watson. All rights reserved.

Think of a model…

Multiply it by 123

Square it

Add 74½ billion

…and you get the same Gini coefficient!

Page 6: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

6

Double lift chart

© 2018 Willis Towers Watson. All rights reserved.

0

20000

40000

60000

80000

100000

4.4%

4.9%

5.4%

5.9%

6.4%

< 88% 88% -90%

90% -92%

92% -94%

94% -96%

96% -98%

98% -100%

100% -102%

102% -104%

104% -106%

106% -108%

108% -110%

110% -112%

> 112%

Re

sp

on

se

Proposed Model / Current Model

Observed Current Model Proposed Model

Financial value estimate

© 2018 Willis Towers Watson. All rights reserved.

Old model prediction

Actual experience

New model prediction

Simple formula

Old/NewNew

premiumExpectedvolume

Actual claims

Increased profit

121% P1 V1 C1 X1

… P2 V2 C2 X2

… … … … …

… P99 V99 C99 X99

85% P100 V100 C100 X100

Valuecreated

$ X

Errors in insurance pricing are not symmetrical

Financial benefit can be estimated

Consider actual experience in out of sample datafor each percentile of old vs new model fitted values

Estimate financial benefit that would have been attained given an assumed elasticity

given business rules such as an assumed cap/floor approach

2.1% on loss ratio…

Page 7: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

7

Is there more to it…?

© 2018 Willis Towers Watson. All rights reserved.

Predictive power

Data

Factor engineering & knowing what

to model Method

Choosing a method

© 2018 Willis Towers Watson. All rights reserved.

Dimensions of utility

Analytical time and

effort

Predictive power

Execution speedTable

implementation

Interpretation

Method

Stability

Page 8: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

8

© 2018 Willis Towers Watson. All rights reserved.

Analytical time and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

GLM

Stability

Focus on Trees

EnsemblesClassifications

Trees"Earth"

K-nearest Neighbours

Elastic NetNeural

Networks

Regression Trees

Naïve Bayes

K-Means Clustering

Principal Components

AnalysisLasso

Support Vector Machines

Gradient Boosting Machines

Random Forests

Ridge Regression

© 2018 Willis Towers Watson. All rights reserved.

Page 9: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

9

Group < 15?

Age < 40?

All data

Decision Trees

© 2018 Willis Towers Watson. All rights reserved.

Group < 5?

Y N

Y NGroup

Ag

e

Y N

© 2018 Willis Towers Watson. All rights reserved.

A simple Tree example

Page 10: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

10

© 2018 Willis Towers Watson. All rights reserved.

A simple Tree example

Group < 3?

Y N

© 2018 Willis Towers Watson. All rights reserved.

A simple Tree example

Group < 3?

Y N

Group < 16?

Y N

Page 11: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

11

© 2018 Willis Towers Watson. All rights reserved.

A simple Tree example

Group < 3?

Y N

Group < 16?

Y N

Group < 18?

Y N

Trees are greedy

© 2018 Willis Towers Watson. All rights reserved.

Let’s build a simple model:

# Equations

Seminar Feedback Score by Equation Count

Page 12: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

12

Trees are greedy

© 2018 Willis Towers Watson. All rights reserved.

Let’s build a simple model:

# Equations

Seminar Feedback Score by Equation Count

Seminar Feedback Score by Equation Count

N

Trees are greedy

© 2018 Willis Towers Watson. All rights reserved.

Let’s build a simple model:

# Equations

Actuary?

#Eq. < 3?

Y N

#Eq. < 3?

Y

- +

Y N

+ -

Page 13: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

13

Trees are greedy

© 2018 Willis Towers Watson. All rights reserved.

Let’s build a simple model:

# Equations

Seminar Feedback Score by Equation Count

Trees are bad at categorical variables

© 2018 Willis Towers Watson. All rights reserved.

Seminar Feedback Score by First Initial

Page 14: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

14

Trees are bad at categorical variables

© 2018 Willis Towers Watson. All rights reserved.

First initial =B, D, E, I, J, L, M, O, Q, R, U,

V or Y?

Low

Y N

High

Seminar Feedback Score by First Initial

Trees are bad at turning points

© 2018 Willis Towers Watson. All rights reserved.

Seminar Feedback Score by Talk Length

Page 15: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

15

Trees are bad at turning points

© 2018 Willis Towers Watson. All rights reserved.

Seminar Feedback Score by Talk Length30 <=Length<= 70

High

Y N

Low

Shortcomings of using trees

© 2018 Willis Towers Watson. All rights reserved.

Summary

They may miss interactions…

… they may struggles with categorical variables….

…and they can be bad at turning points

Page 16: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

16

© 2018 Willis Towers Watson. All rights reserved.

Analyticaltime and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

Decision Trees

Stability

Some machine learning methods

© 2018 Willis Towers Watson. All rights reserved.

EnsemblesClassifications

Trees"Earth"

K-nearest Neighbors

Elastic NetNeural

Networks

Regression Trees

Naïve Bayes

K-Means Clustering

Principal Components

AnalysisLasso

Support Vector Machines

Gradient Boosting Machines

Random Forests

Ridge Regression

Page 17: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

17

Focus on Random Forests

© 2018 Willis Towers Watson. All rights reserved.

EnsemblesClassifications

Trees"Earth"

K-nearest Neighbours

Elastic NetNeural

Networks

Regression Trees

Naïve Bayes

K-Means Clustering

Principal Components

AnalysisLasso

Support Vector Machines

Gradient Boosting Machines

Random Forests

Ridge Regression

Random Forests

© 2018 Willis Towers Watson. All rights reserved.

Tree 1: Prediction 1 = Signal 1 + Noise 1

Tree 2: Prediction 2 = Signal 2 + Noise 2

Tree 3: Prediction 3 = Signal 3 + Noise 3

Tree 1000: Prediction 1000 = Signal 1000 + Noise 1000

Random Forest:

Prediction = AVERAGE(Tree Predictions)

= AVERAGE(Tree Signal) + AVERAGE(Tree Noise)

Average Noise 0 if the trees are independent

Independence of trees achieved by fitting each tree to:

Random subset of data (bootstrap sample)

Random subset of factors

Average Signal Underlying trend, provided trees are complex enough to represent it

This is bagging (bootstrap aggregation) – fit lots of independent models and take an average

Page 18: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

18

© 2018 Willis Towers Watson. All rights reserved.

A simple Random Forest example

© 2018 Willis Towers Watson. All rights reserved.

A simple Random Forest example

Page 19: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

19

© 2018 Willis Towers Watson. All rights reserved.

A simple Random Forest example

© 2018 Willis Towers Watson. All rights reserved.

A simple Random Forest example

Page 20: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

20

© 2018 Willis Towers Watson. All rights reserved.

A simple Random Forest example

© 2018 Willis Towers Watson. All rights reserved.

Analytical time and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

Random Forest

Stability

Page 21: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

21

Some machine learning methods

© 2018 Willis Towers Watson. All rights reserved.

EnsemblesClassifications

Trees"Earth"

K-nearest Neighbors

Elastic NetNeural

Networks

Regression Trees

Naïve Bayes

K-Means Clustering

Principal Components

AnalysisLasso

Support Vector Machines

Gradient Boosting Machines

Random Forests

Ridge Regression

Focus on Gradient Boosting Machines

© 2018 Willis Towers Watson. All rights reserved.

Classifications Trees

"Earth"

K-nearest Neighbours

Elastic NetNeural

Networks

Regression Trees

Naïve Bayes

K-Means Clustering

Principal Components

AnalysisLasso

Support Vector Machines

Gradient Boosting Machines

Random Forests

Ridge Regression

Page 22: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

22

Gradient Boosting Machine or “GBM”

A GBM

λ

© 2018 Willis Towers Watson. All rights reserved.

All DataGroup < 5?

Y N

Age < 40?

Y N

Y N

Group < 15?

A tree

λ + λ + λ + λ +

λ + λ + λ + λ +

λ + λ + λ + λ +

λ + λ + λ + λ

Four main assumptions

© 2018 Willis Towers Watson. All rights reserved.

Learning rate / “shrinkage”

Amount by which the old model predictions are varied for the next model iteration

New model = Old + (Prediction x Learning rate)

Interaction depth

Number of splits allowed on each tree (or the number of terminal nodes – 1)

N Number of trees (iterations) allowed

Bag fraction

Trees are fitted to a subset of the data (the bag fraction) on a randomized basis

Additional noise-reduction can be achieved by using a random subset of the available factors at each iteration

All DataGroup < 5?

Y N

Age < 40?

Y N

Y N

Group < 15?

Page 23: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

23

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 0

Current residuals Underlying trend Current fitted values

# factors = 1

Interaction depth = 1

Learning rate = 10%

Bag fraction = 100%

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 0

Current residuals Model trained on current residuals Underlying trend Current fitted values

Page 24: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

24

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 0

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 1

Current residuals Underlying trend Current fitted values

λ

Page 25: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

25

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 1

Current residuals Model trained on current residuals Underlying trend Current fitted values

λ

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 1

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ

Page 26: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

26

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 2

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ + λ

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 3

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ + λ + λ

Page 27: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

27

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 4

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ + λ + λ + λ

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 5

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ + λ + λ + λ + …

λ

Page 28: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

28

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 6

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ + λ + λ + λ + …

λ + λ

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 7

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ + λ + λ + λ + …

λ + λ + λ

Page 29: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

29

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 8

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ + λ + λ + λ + …

λ + λ + λ + λ

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 9

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ + λ + λ + λ + …

λ

λ + λ + λ + λ + …

Page 30: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

30

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 10

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ + λ + λ + λ + …

λ + λ

λ + λ + λ + λ + …

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 20

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ + λ + λ + λ + …

λ + λ

λ + λ + λ + λ + …

λ + λ + λ + λ + …

λ + λ + λ + λ

+ λ + λ + …

Page 31: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

31

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 30

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

λ + λ + λ + λ + …

λ + λ

λ + λ + λ + λ + …

λ + λ + λ + λ + …

λ + λ + λ + λ

+ λ + λ + …

λ + λ + λ + λ + …

λ + λ

λ + λ + λ + λ + …

+ …

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 40

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

Page 32: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

32

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 50

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 100

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

Page 33: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

33

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 200

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 300

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

Page 34: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

34

A simple GBM example

© 2018 Willis Towers Watson. All rights reserved.

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

GBM results at iteration 1,000

Current residuals Model trained on current residuals Incremental model update Underlying trend Current fitted values

Calibrating the assumptions

© 2018 Willis Towers Watson. All rights reserved.

n-fold cross validation used to develop the interaction depth and learning rate assumptions Eg for 3-fold validation, split into 3, fit on purple, test on blue parts, take average

Resulting plots can be used to determine the optimal assumption choice Including how many trees to run

Fit

Fit

Test

Fit

Test

Fit

Test

Fit

Fit

1 2 3

Page 35: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

35

Example 5-fold cross validation

Best result shown by brown line as has lowest minimum validation error (interaction depth 2 and learning rate

2% in this case)

Minimum point shows optimal number of trees in

each case.

This example is based on artificial data – large

insurance datasets indicate a larger number of trees to

be optimal

© 2018 Willis Towers Watson. All rights reserved.

What does a GBM look like?

© 2018 Willis Towers Watson. All rights reserved.

Page 36: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

36

What does a GBM look like?

© 2018 Willis Towers Watson. All rights reserved.

What does a GBM look like?

© 2018 Willis Towers Watson. All rights reserved.

Page 37: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

37

© 2018 Willis Towers Watson. All rights reserved.

Does it work? How does it work?

© 2018 Willis Towers Watson. All rights reserved.

Page 38: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

38

Deploying GBMs

© 2018 Willis Towers Watson. All rights reserved.

Age ExposureBurning

CostVehicle Group

ExposureBurning

Cost

1 <=20 1,720 179 1 1-10 164,107 77

2 21-30 34,893 122 2 11-14 84,859 101

3 31-50 118,182 102 3 15-18 28,952 116

4 51+ 127,054 70 4 19-20 3,931 272

5 Age Total 281,849 91 5 VG Total 281,849 91

Gender ExposureBurning

Cost

1 Male 197,339 92

2 Female 84,510 87

3Gender

Total281,849 91

Model down into multiplicative tables via GLMs

Use insights to guide GLM

Factor Reduction

Establish Model

Hierarchy

Corner correctors and pre-baked

interactions

Deploy directly

Deploying GBMs

© 2018 Willis Towers Watson. All rights reserved.

Deploy directly

Pre / post mapping

“Comfort Diagnostics”

ModelData0

20000

40000

60000

80000

100000

4.4%

4.9%

5.4%

5.9%

6.4%

< 88% 88% -90%

90% -92%

92% -94%

94% -96%

96% -98%

98% -100%

100% -102%

102% -104%

104% -106%

106% -108%

108% -110%

110% -112%

> 112%

Re

sp

on

se

Proposed Model / Current ModelObserved Current Model Proposed Model

Deploy directly

Page 39: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

39

© 2018 Willis Towers Watson. All rights reserved.

Analytical time and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

GBMs

Stability

Implementationin modern

rating engines

Some machine learning methods

EnsemblesClassifications

Trees"Earth"

K-nearest Neighbors

Elastic NetNeural

Networks

Regression Trees

Naïve Bayes

K-Means Clustering

Principal Components

AnalysisLasso

Support Vector Machines

Gradient Boosting Machines

Random Forests

Ridge Regression

© 2018 Willis Towers Watson. All rights reserved.

Page 40: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

40

Focus on Neural Networks

© 2018 Willis Towers Watson. All rights reserved.

EnsemblesClassifications

Trees"Earth"

K-nearest Neighbours

Elastic NetNeural

Networks

Regression Trees

Naïve Bayes

K-Means Clustering

Principal Components

AnalysisLasso

Support Vector Machines

Gradient Boosting Machines

Random Forests

Ridge Regression

willistowerswatson.com

Where is the value?

Which policyholder is more likely to make a claim?

© 2018 Willis Towers Watson. All rights reserved.

Page 41: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

41

willistowerswatson.com

Where is the value?

Which picture is more likely to be of a cat?

© 2018 Willis Towers Watson. All rights reserved.

willistowerswatson.com

Where is the value?

© 2018 Willis Towers Watson. All rights reserved.

Which picture is more likely to be of a cat?

Page 42: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

42

willistowerswatson.com

Neural networks

© 2018 Willis Towers Watson. All rights reserved.

?

© 2018 Willis Towers Watson. All rights reserved.

Analytical time and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

Neural network

Stability

Page 43: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

43

Some machine learning methods

EnsemblesClassifications

Trees"Earth"

K-nearest Neighbors

Elastic NetNeural

Networks

Regression Trees

Naïve Bayes

K-Means Clustering

Principal Components

AnalysisLasso

Support Vector Machines

Gradient Boosting Machines

Random Forests

Ridge Regression

© 2018 Willis Towers Watson. All rights reserved.

Focus on Penalized Regression

EnsemblesClassifications

Trees"Earth"

K-nearest Neighbours

Elastic NetNeural

Networks

Regression Trees

Naïve Bayes

K-Means Clustering

Principal Components

AnalysisLasso

Support Vector Machines

Gradient Boosting Machines

Random Forests

Ridge Regression

© 2018 Willis Towers Watson. All rights reserved. Proprietary and Confidential. For Willis Towers Watson and Willis Towers Watson client use only.

Page 44: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

44

Penalized Regression

© 2018 Willis Towers Watson. All rights reserved.

| , f(x) = g-1(X.) where estimated by minimizing

GLM

Penalized Regression

© 2018 Willis Towers Watson. All rights reserved.

Ridge ∑ Lasso∑Elastic Net

| , f(x) = g-1(X.) where estimated by minimizing

Elastic Net

Ridge Lasso GLM

Heavily penalize large parameters, but does not reduce parameters to zero

Penalty reduces insignificant parameter values to zero - useful for variable selection

Mix of the two

Page 45: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

45

Penalized Regression

| , f(x) = g-1(X.) where estimated by minimizing

Elastic Net

Ridge Lasso GLM

Ridge ∑ Lasso∑Elastic Net

Heavily penalize large parameters, but does not reduce parameters to zero

Penalty reduces insignificant parameter values to zero - useful for variable selection

Mix of the two

© 2018 Willis Towers Watson. All rights reserved.

Deploying Penalized Regression

© 2018 Willis Towers Watson. All rights reserved.

Same as GLMs!

Age ExposureLossCost

Vehicle Group

ExposureLossCost

1 <=20 1,720 179 1 1-10 164,107 77

2 21-30 34,893 122 2 11-14 84,859 101

3 31-50 118,182 102 3 15-18 28,952 116

4 51+ 127,054 70 4 19-20 3,931 272

5 Age Total 281,849 91 5 VG Total 281,849 91

Gender ExposureLossCost

1 Male 197,339 92

2 Female 84,510 87

3Gender

Total281,849 91

Page 46: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

46

© 2018 Willis Towers Watson. All rights reserved.

Analytical time and

effort

Predictive power

TableImplementation

Interpretation

Penalized Regression

Stability

Execution speed

Analyticaltime and

effort

Predictive power

TableImplementation

Interpretation

Penalised Regression

Stability

Execution speed

A toolkit…

© 2018 Willis Towers Watson. All rights reserved.

Analyticaltime and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

Emblem

Stability

Analyticaltime and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

"Earth"

Stability

Analytical time and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

GBMs

Stability

Analytical time and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

Trees

Stability

Analytical time and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

Random Forests

Stability

Analyticaltime and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

Support Vector

Machine

Stability

Analytical time and

effort

Execution speedTable

Implementation

Interpretation

Neural Networks

Stability

Predictive power

Page 47: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

47

Conclusions

© 2018 Willis Towers Watson. All rights reserved.

Analyticaltime and

effort

Predictive power

Execution speedTable

Implementation

Interpretation

Emblem

Stability

What do you use where?

© 2018 Willis Towers Watson. All rights reserved.

%Data science Domain experts

Page 48: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

48

It’s domain expertise that helps decide

© 2018 Willis Towers Watson. All rights reserved.

Data science

Domain experts

Issues for the Profession(s)

Regulatory issues

TAS: Judgement - what judgement?

GDPR

UK Government Select Committee (Science and Technology)

Training

CAS, SOA ahead? (eg CSPA)

IFoA playing catch up

Is it too late?Role of the actuary

Domain expertise matters (at least currently)

Easier for an actuary to pick up machine learning than for a data scientist to understand insurance?

Siloed teams don’t work

Familiarity and the right vernacular can help

Scope of involvement?Pricing Reserving Claims analytics Customer management ? Marketing ???

© 2018 Willis Towers Watson. All rights reserved.

Page 49: 22 November 2018 · 2018-11-26 · Machines Gradient Boosting Machines Random Forests Ridge ... Random Forest: Prediction = AVERAG E(Tree Predictions) = AVERAGE(Tree Signal)+ AVERAGE(Tree

22 November 2018

49

Questions and Discussion

© 2018 Willis Towers Watson. All rights reserved.