2017 predictive analytics symposium - soa › globalassets › assets › files › e... · using...

48
2017 Predictive Analytics Symposium Session 9, Programming in R Moderator: Nicholas Scott Hanewinckel, FSA, CERA Presenters: Melissa Carruthers, FSA, FCIA Benjamin Johnson SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Upload: others

Post on 03-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

2017 Predictive Analytics Symposium

Session 9, Programming in R

Moderator: Nicholas Scott Hanewinckel, FSA, CERA

Presenters:

Melissa Carruthers, FSA, FCIA Benjamin Johnson

SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

Page 2: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

R U UP On R?Melissa Carruthers, FSA, FCIAManager, DeloitteSeptember 2017

Page 3: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

2© Deloitte LLP and affiliated entities.

An evolving landscape 3

Introducing R 9

Getting started 16

Agenda

Page 4: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

An evolving landscape

Page 5: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

4© Deloitte LLP and affiliated entities.

In any given minute…

639,800 GB of global IP data is transferred

• 1.3 million videos

• 2+ million topics searched

• 47,000 application downloads

Page 6: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

5© Deloitte LLP and affiliated entities.

BIG DATA in life insurance

Page 7: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

6© Deloitte LLP and affiliated entities.

The power of predictive analyticsUsing predictive analytics we can determine the likelihood of a person having certain health characteristics and developing future ailments.

Slow caffeine metabolizer

(drinking coffee increases chance of heart attack)

She is pregnant and the baby is likely to weigh 2 ounces less than average at birth

8% chance of developing rheumatoid

arthritis

1% chance of developing age-related

macular degeneration

10% chance of developing

breast cancer

Average odds of having

hay fever

Average odds of

developing ovarian cancer

Tendency toward

higher BMI

Greater tendency to

overeat

2% chance of developing

chronic kidney disease

.1% chance of developing

type 1 diabetes

Average odds of getting gout

Strong chance of developing severe nearsightedness

< 2% chance of developing Parkinson’s

disease

Has wet earwax

4% chance of

developing melanoma

Average odds of developing

uterine fibroids

Low odds for high blood

pressure

Average odds of

developing glaucoma

Average sensitivity to sweaty

odorsAverage odds of developing esophageal

cancer

Average odds of

developing pancreatic

cancer

Low risk of developing gestational diabetes during

pregnancy

5% chance of

developing MS

Likely lactose

intolerant

Page 8: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

7© Deloitte LLP and affiliated entities.

Changing the way we work

Cross-Sell ProgramsIdentify existing

customers who are likely to need and likely to buy a second product – an

annuity, a P&C product, etc. Deploy customized,

targeted offers.

Product Design & Pricing

Sales and Marketing

New Business & Underwriting

Inforce Management

Claims and Fraud

Producer Optimization

Producer Recruitment

Identification of individuals most likely to become a successful

producer for a given manufacturer

Producer RetentionSegmenting existing

producers and deploying customized tactics to support success and

retentionProducer-Client

MatchingIdentify behavioral

patterns and personality attributes associated with

successful, lasting producer-client

relationships; deploy tactics to optimize

matches

Fraud DetectionIdentify potential over-payments of claims for LTC or related products

UnderwritingPredicting mortality

experience on a seriatim basis, using new data

sources to supplement or replace certain traditional

medical exams

Application TriageIdentifying certain

healthy individuals for which certain medical exams can be waived

Up-Sell ProgramsIdentify existing

customers whose need for life insurance has increased, and who

remain healthy. Offer increased face amount

with limited underwriting

Target Marketing / Lead Generation

Improve quality of leads by identifying those most likely to qualify & most

likely to buy

Sampling of Applications of Predictive Analytics in Core Operations for Life Insurers

Post-Level Term OffersSegment population

based on current health risk, current life insurance needs, likelihood to buy.

Deploy customized, targeted offers

LTC Claims Management (Active

Lives)For each active life,

estimate the likelihood of developing certain

cognitive or physical impairments, then

proactively encourage healthy policyholder behavior to enable

preventionLTC Claims

Management (Disabled Lives)

For each disabled life, estimate the likelihood of transitions between type of impairment (physical

vs. cognitive) and associated level of care required (home health

care, assisted care facility, nursing home),

then proactively encourage healthy policy

holder behavior

Customer Lifetime Value

Enable calculation of customized individual

CLV; deploy customized proactive tactics for

retention, second offers, etc.

Retention StrategyUse customized,

individual estimate of lapse likelihood to enable customized proactive and

reactive tactics to improve retention

effectiveness

Page 9: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

8© Deloitte LLP and affiliated entities.

Ready for the opportunities ahead?

Page 10: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Introducing R

Page 11: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

10© Deloitte LLP and affiliated entities.

Overview of R

2 Millions Users shown in Blue from the Map

Profile• R- Statistical Programing Language

• Developed in 1994

• Originated at the University of Auckland, New Zealand

• Created by Ross Ihaka & Robert Gentleman

Page 12: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

11© Deloitte LLP and affiliated entities.

Overview of R

Page 13: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

12© Deloitte LLP and affiliated entities.

Pros and Cons of R

FreeOpen

Source

Visualization

Memory Management

Learning curve

Package Ecosystem

StrongCommunity

Page 14: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

13© Deloitte LLP and affiliated entities.

R has 9,153 packages built and ready for use

“If you’re trying to do something that’s not in the code set, you go out and find an R package ... and then you snap it right in and start using it,”

-Robert Sudol, Sr. Development Manager in Fixed Income Technology, AllianceBernstein.

Page 15: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

14© Deloitte LLP and affiliated entities.

Predictive modelling in R

Page 16: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

15© Deloitte LLP and affiliated entities.

Useful packages for ActuariesActuar Actuarial Functions and Heavy Tailed Distributions

ActuDistns Functions for actuarial scientists

CompLognormal Functions for actuarial scientists

ChainLadder Statistical Methods and Models for Claims Reserving in General Insurance

Lifecontingencies Financial and Actuarial Mathematics for Life Contingencies

Raw R Actuarial Workshops

Tweedie Tweedie exponential family models which is useful for modeling pure premiums.

insuranceData A Collection of Insurance Datasets Useful in Risk Classification in Non-life Insurance

Page 17: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Getting started

Page 18: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

17© Deloitte LLP and affiliated entities.

Installing R

Obtaining R

• After clicking on CRAN you will be asked to select a Mirror

• This is the location that you will download R from

• You should select the location nearest youMirrors

• Options to download R for Linux, Mac or Windows

• Download the most recent version of R

• Computer will download the installer packageMachine

• R is available for the Unix-like, Windows and Mac families

• To download R, go to www.r-project.org and click on CRAN

• The R Core Team supports the use of R for commercial purposes

CRAN

• Click on CRAN

• Comprehensive R Archive Network (CRAN) collection of sites with material on R distribution, extensions, documentation and binaries

• https://CRAN.R-project.org/

Run & Install

• Run and install as you would any other application

• Usually reasonable to accept default options, select & continue throughout

• Select the standard installation

Page 19: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

18© Deloitte LLP and affiliated entities.

R Distribution Manuals

Manual Description

An Introduction to R Includes information on data types, programming elements, statistical modeling and graphics

Writing R ExtensionsDescribes the process of creating R add-on packages, writing R documentation, R’s system and foreign language interfaces, and the R API

R Data Import/Export Guide to importing and exporting data to and from R

The R Language Definition

First version of the ‘Kernighan & Ritchie of R”, explains evaluation, parsing, object oriented programming and computing on the language

R Internals Guide to R’s internal structures

• Online documentation is available for most functions and variables within the R prompt as well as in pdf/html format online

• For each set of manuals sold, the publisher donates $10 USD to the R Foundation

Page 20: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

19© Deloitte LLP and affiliated entities.

Get practicing Use existing datasets and packages in R to easily get started

Call existing datasets

Look up info “datasets”

Select dataset called AirPassangers

Summary Stats

Page 21: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

20© Deloitte LLP and affiliated entities.

Working in R-Studio

Page 22: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

21© Deloitte LLP and affiliated entities.

Working in R-Studio

R code editor: Where you will do majority of your work –run portions or all of your R code

Interactive console:Allows you to type R statements one line at a time, shows output of code run in editor

Workspace: list of objects in memory, history tab with list of prior commands

• Visuals/ plots

• List of available packages

• Files in working directory

• Help files

Page 23: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

22© Deloitte LLP and affiliated entities.

Demos in RR has readily available demos to help with the learning curve

Page 24: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

23© Deloitte LLP and affiliated entities.

Act now and gain a competitive advantage of using R

Page 25: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Predictive Analytics Symposium

Ben Johnson, MSSession 9: Programming in RSeptember 14, 2017

Page 26: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Why choose R over others?

• Free• Install R on any computer

• Open source• Share as much as you want

• Large community• Easily find support online

2

The O’Jays

Page 27: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

R is a tool, not the chest

• Don’t join the argument of R or Excel, R or Python, R or SAS

• This isn’t Team Edward vs Team Jacob

3

Team

R(He’s a “vampie-R”)

Page 28: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

When R isn’t the best thing ever

• Restrictive limitation in computing power• R can require a lot of local memory

• You have a need for speed• Some operations are faster using a language like C++

• Quick and easy, on-the-fly stuff• We didn’t invent the calculator to replace our fingers for

counting

4

Page 29: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Start using R

5

Page 30: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Advice for starting with R

• Install a good GUI• RStudio• Jupyter• Deducer• RKWard• Rcmndr• etc.

6

Page 31: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Advice for starting with R

• Document scripts well• Comments follow #• R Projects help segment code

• Manage parentheses• RStudio helps a lot

• Adopt an R coding style• Google’s R Style Guide• Hadley Wickham’s Style Guide

7

Einstein’s notebook

Page 32: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Advice for starting with R

• Differentiate between variable assignment methods• =, <-, ->, assign() each tell R, “this is true”!• == asks R, “is this true”?

• Use subsets of your data while writing code• R is case sensitive

8

data Data DATAHi Data!

Hello data!

Wow, DATA!! I haven’t seen you in forever.

I can’t believe how different we

all are

Page 33: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Advice for starting with R

• When in doubt… Google, Google, Google• Stack Overflow is your new best friend

• Don’t reinvent the horse, use libraries• Find help info for a function using help()

9

Page 34: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Working with data in R

10

Page 35: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Understand that variables have class

• Numeric• is.numeric()

• Character• as.character()

• Date• as.Date()

• NA values• is.na(), anyNA()

11

Page 36: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Data structure• Save data as RDS or RData to preserve information• Save data as CSV to explore it in an Excel spreadsheet

12

Homogeneous Heterogeneous

1d Atomic vector List

2d Matrix Data frame

nd Array

Page 37: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Useful functions for data exploration

• dim(), nrow(), ncol()• Identify dimensions of data

• head() and tail()• Preview first or last rows of data

• summary() and table()• Descriptive statistics of data

13

Data

Page 38: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Creating reports with R

14

Page 39: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

rmarkdown

15

• Notebook style code

• PDF or HTML output

• Compiles top to bottom

• Reproducible research

Page 40: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

shiny

16

Page 41: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

ggplot2

• Especially helpful in visualizing grouped data

17

Page 42: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

formattable

• Conditional formatting• Either pre-defined or custom functions

18

Page 43: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

DT

19

• Search and filter data• Display larger tables

Page 44: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Demo in R

20

Page 45: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Find the full code on GitHub

https://github.com/milliman/PASymp-Programming-in-R

21

Page 46: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Loading the data

22

Page 47: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Fitting the model

23

Page 48: 2017 Predictive Analytics Symposium - SOA › globalassets › assets › Files › e... · Using predictive analytics we can determine the likelihood of a person having certain health

Validation plots

24