practical predictive analytics seminar - soa · the predictive analytics & futurism section...

77
The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore, MD Presenters: Talex Diede, MS Jean‐Marc Fix, FSA, MAAA Brian D. Holland, FSA, MAAA Ben Johnson, MS Matthias Kullowatz, MS Richard Marshall Lagani, Jr., MA SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Upload: others

Post on 06-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

The Predictive Analytics & Futurism Section Presents 

Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore, MD 

Presenters: Talex Diede, MS 

Jean‐Marc Fix, FSA, MAAA  Brian D. Holland, FSA, MAAA 

Ben Johnson, MS Matthias Kullowatz, MS 

Richard Marshall Lagani, Jr., MA 

SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Page 2: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Practical Predictive Analytics Seminar

Jean‐Marc Fix, FSA, MAAAIntro to R9 May 2018

Page 3: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

SOCIETY OF ACTUARIESAntitrust Notice for Meetings 

Active participation in the Society of Actuaries is an important aspect of membership. However, any Society activity that arguably could be perceived as a restraint of trade exposes the SOA and its members to antitrust risk.  Accordingly, meeting participants should refrain from any discussion which may provide the basis for an inference that they agreed to take any action relating to prices, services, production, allocation of markets or any other matter having a market effect.  These discussions should be avoided both at official SOA meetings and informal gatherings and activities.  In addition, meeting participants should be sensitive to other matters that may raise particular antitrust concern: membership restrictions, codes of ethics or other forms of self‐regulation, product standardization or certification.  The following are guidelines that should be followed at all SOA meetings, informal gatherings and activities:

• DON’T discuss your own, your firm’s, or others’ prices or fees for service, or anything that might affect prices or     fees, such as costs, discounts, terms of sale, or profit margins.

• DON’T stay at a meeting where any such price talk occurs.

• DON’T make public announcements or statements about your own or your firm’s prices or fees, or those of competitors, at any SOA meeting or activity.

• DON’T talk about what other entities or their members or employees plan to do in particular geographic or product markets or with particular customers.

• DON’T speak or act on behalf of the SOA or any of its committees unless specifically authorized to do so.

• DO alert SOA staff or legal counsel about any concerns regarding proposed statements to be made by the association on behalf of a committee or section.

• DO consult with your own legal counsel or the SOA before raising any matter or making any statement that you think may involve competitively sensitive information.

• DO be alert to improper activities, and don’t participate if you think something is improper.

• If you have specific questions, seek guidance from your own legal counsel or from the SOA’s Executive Director or legal counsel.

2

Page 4: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Presentation Disclaimer

Presentations are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the participants individually and, unless expressly stated to the contrary, are not the opinion or position of the Society of Actuaries, its cosponsors or its committees. The Society of Actuaries does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented. Attendees should note that the sessions are audio‐recorded and may be published in various media, including print, audio and video formats without further notice.

3

Page 5: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

A is for ActuaryB is for BigC is for ComplexD is for Data

4

Page 6: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

What R you afraid of?

5

Page 7: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Basic R: A Programming Language!

6

Page 8: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

R Studio: be a star on your own computer

7

Page 9: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

A script without a movie

8

Page 10: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Ggraphhing withh ggplot2

9

Page 11: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Playing with dplyr

10

Page 12: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

The black box

11

Page 13: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Pygmalion

Pygmalion by Etienne Falconet

Page 14: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Be a learn‐R

13

Page 15: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Matthias KullowatzBen JohnsonSession 2: Predictive Models in Life and AnnuitiesMay 9, 2018

Practical Predictive Analytics Seminar

Page 16: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Theory

2

2018 SOA

Page 17: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Agenda• Questions of interest for life and annuity products• Predictive model forms that are best suited to investigating them

• Associated theoretical concerns that may arise in the modeling process

3

Page 18: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Questions of interest• When will a policyholder…

• Lapse?• Partially withdraw?• Die?

• How will a policyholder utilize the policy?• What drives these “behaviors” and why?• Are the findings implementable?

4

Page 19: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Predictive model forms

5

Icon made by Freepik from www.flaticon.com

Page 20: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Regression

6

• OLS, GLM, ridge, lasso, elastic net• Pros

• Quick fitters• Interpretable coefficients and output• Harder to overfit• Widely used

• Cons• Constrained by functional form• Multicollinearity issues

Page 21: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Tree‐based models

7

• Decision trees, bagging, boosting• Pros

• Inherently models interactions• Model relationships non‐parametrically

• Cons• Black‐box formula• Doesn’t interpolate or extrapolate well

Page 22: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Clustering, et. al.

8

• K‐means, hierarchical, k‐nearest neighbors• Pros

• Reduces dimensionality (clustering)• Easy to explain predictions (k‐nearest neighbors)

• Cons• Sensitive to outliers• Reduces dimensionality

Page 23: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Neural networks

9

• Pros• Inherent interaction effects/non‐parametric• Well‐suited for many predictor variables

• Image recognition and text analysis type problems

• Cons• Black‐box formula (even more opaque than GBM/RF)• Estimate uncertainty harder to measure• Computationally intensive

Icon made by Freepik from www.flaticon.com

Page 24: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Logistic GLM• For predicting probabilities of binary outcomes• Link function provides much needed flexibility• Predictor variables can be quantitative or qualitative

10

Page 25: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Why a link function?

11

predictor

resp

onse

0

1

p

Page 26: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

The logistic function•

• ⋯• lim

→1 lim

→0

•• Logit function (“logodds”)

12

Page 27: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Consequences of logit link

13

Page 28: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Interpretation of coefficients

14

•   

• Continuous x‐value:

• Odds ratio

Page 29: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Theoretical extras• Independent observations• The model is fit by maximizing the following:

ln 1 ln 1

• 2 2• 2 ln

15

Page 30: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Practical concerns

16

2018 SOA

Page 31: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Predictive analytics process

17

Data Prep

Exploratory Analysis

Modeling

training/holdout test

Validation

Page 32: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Practical concerns: Data• Formatting variables (1)• Identifying and dealing with outlier data values (2)• Accounting for missing data (2)• Derive new variables for modeling (3)• Compile dataset into appropriate format (4)

18

Page 33: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Practical concerns: Modeling• Holdout dataset (2A)• Fitting a model (2C)• Using the step function for variable selection (2D)• Multicollinearity concerns (2E)• Setting reference levels for factors (DataPrep 2)• Piecewise terms (2F)• Undersampling (3)

19

Page 34: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Data outliers

20

Page 35: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Missing values

21

Page 36: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Missing values

22

Model NA treatment Intercept Heightcoefficient

Flag coefficient

Death ~ height Removed ‐4.418 0.0100 N/A

Death ~ height + Ind Set to 0 ‐3.580 0.0100 ‐0.838

Death ~ height + Ind Set to mean ‐4.245 0.0100 ‐0.173

Death ~ height Set to 0 ‐3.589 ‐0.0024 N/A

Death ~ height Set to mean ‐4.343 0.0095 N/A

• The first three models are mathematically equivalent• The second two are biased

Page 37: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Training versus holdout data

23

Page 38: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Cross validation

• This is a compromise for when you don’t have enough data for multiple holdout subsets

• Divide training data into random subsets• Use each subset as a holdout dataset for validation

24

Page 39: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Stepwise model building

25

logodds f(attained age) f(cad) f(cognitive)

logodds f(attained age) f(cad)

logodds f(attained age)

Page 40: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Multicollinearity• pairs()

• cor()

• vif()

26

height weight bmiheight 1.000000 0.637640 0.052578weight 0.637640 1.000000 0.795710bmi 0.052578 0.795710 1.000000

Page 41: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Reference levels

27

Active Sedentery Average NA Active Sedentery Average NA

Page 42: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Piecewise linear effects

28

5 7 9 11 13 15 17 190

2

4

6

8

10

12

0.0

0.5

1.0

1.5

2.0

.

A/E by predictor before piecewise split

5 7 9 11 13 15 17 19-1.0

-0.8

-0.6

-0.4

-0.2

0.0

Piecewise impact of example predictor

Page 43: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Undersampling• For logistic regression, undersampling can help improve runtimes:• All deaths (n) + • Randomly selected non‐deaths (3n)

• Fitting the model Death ~ AttAge

29

Dataset Records Runtime Intercept AttAgecoefficient

Full 259,284 2.15 ‐14.13 0.129

Undersampled 25,152 0.12 ‐10.99 0.123

Page 44: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Validation

30

2018 SOA

Page 45: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Validation and comparison• Overall model fit (4A)

• Bias‐variance tradeoff• Comparison between two candidate models (4B)

31

Page 46: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Model fit• R2

• Log‐likelihood/AIC/BIC• Actual‐to‐expected plots (4A‐i)• Confusion matrix (4A‐ii)• AUC (4A‐iii)

32

Page 47: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Confusion matrix• Select a threshold for predicting the outcome• Build a 2x2 contingency table

33

Prediction Death

0 1 Total

0 65,815 835 66,650

1 18,500 1,313 19,813

Total 84,315 2,148 86,463

True positive rate = 1,313/2,148 = 0.658 (1 – Type‐II error)False positive rate = 18,500/84,315 = 0.301 (Type‐I error)

Page 48: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Area under the curve (AUC)• The curve here is the relationship of the true positive rate and false positive rate as the threshold moves from 0 to 1

34

Page 49: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

• Actual to expected (4B)• Two‐way lift (4B)

Model comparison: Lift charts

35

Page 50: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Thank you!

36

2018 SOA

Page 51: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Practical Predictive Analytics Seminar

Talex Diede, MSSession 3: Machine Learning TopicsMay 9, 2018

Page 52: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

GLM review• Linear model• Interpretable• Issues:

• Multicollinearity• Variable selection• Variable importance• Interactions

2

Page 53: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Why machine learning?

• Data continues to grow• Powerful• Flexible• Computational enhancements

• Cheaper• More available

• It’s sexy

3

Page 54: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Machine learning techniques• Regularization methods• Classification and regression trees• Ensemble models• Others: 

• Clustering• Bayesian• Neural network• Deep learning

4

Milly

Page 55: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Regularization Methods

5

Page 56: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

What is “regularization”?• Regularization is a technique used to avoid the problem of overfitting. The idea is to add a complexity term to the loss function to penalize more complex models.

6

Page 57: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Regularization methods• Ridge regression • LASSO • ElasticNet

• In R:• Packages: glmnet, MASS, ridge, lars, elasticnet, …

7

Page 58: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Ridge regression

• weight decay• L2‐norm penalty•

ln 1 ln 1

8

Page 59: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

LASSO

• Least absolute shrinkage and selection operator• L1‐norm penalty•

ln 1 ln 1 | |

9

Page 60: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

ElasticNet

• Convex combination of ridge and LASSO• L2 & L1‐norm penalties•

ln 1 ln 1 | |

10

Page 61: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

654

321

Aside: Cross‐Validation

• Useful for smaller datasets

11

1 2 3

4 5 6

Page 62: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Classification and Regression Trees (CART)

12

Page 63: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Trees

• Sequence of questions/rules for splitting the data• Elements of CART algorithms

• Rules for splitting data at each node• Stopping criteria• Prediction for the target variable

13

N = 3500 = 200/3501 = 150/350

Page 64: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Classification vs regression

• Classification trees: used for categorical or binary target variables• Predict the category a policy will fall into

• Regression trees: continuous target variable• Predict the value of the continuous target

14

Page 65: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Splitting nodes

• Goal: choose the split that results in nodes with maximum homogeneity

• Classification: “Impurity” function• Entropy• Misclassification rate• Gini index• Twoing

• Regression: Squared residuals minimization

15

Page 66: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Stopping rules

• Depth• Size• Number of nodes• Complexity parameter

16

STOP

Page 67: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Ensemble Models

17

Page 68: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Overview

• What:• An ensemble model is the aggregation of two or more related but different models, averaged into a single prediction.

• Why:• Improve accuracy of predictions• Improve stability of the model

18

Page 69: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Ensemble methods

• Bagging• Boosting• Stacking

19

Page 70: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Bagging

• What is it:• Building multiple models from different subsamples of the training dataset, results are then combined for the final prediction.

• Helps to reduce the variance error• Example:

• Random Forest• R package: randomForest, …

20

Page 71: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Boosting

• What is it:• Building multiple models, each of which is built to improve the prediction errors of a prior model

• Has shown better predictive accuracy than bagging, but more likely to overfit

• Example:• Gradient Boosted Machines (GBM)• R packages: gbm, xgboost, …

21

Page 72: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Stacking

• What is it:• Building multiple models, typically different types of models, then having a supervisor model that determines how to best combine those results

22

Page 73: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Back to R!

23

Page 74: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Final Thoughts

24

Page 75: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Weighing your options

• Implementation • Explanation• Cost 

25

MAGIC

Log Odds

β1 X1 β 2 X2 β 3 X3

Page 76: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Other considerations

• Actuarial judgment• Model selection• Data issues• Hardware/Software

26

MALE

FEMALE

NA

Page 77: Practical Predictive Analytics seminar - SOA · The Predictive Analytics & Futurism Section Presents Practical Predictive Analytics May 9, 2018 | Baltimore Marriott Waterfront | Baltimore,

Now you’re on your way!

27