2017 predictive analytics symposium - soa › globalassets › assets › files › e... · using...

Post on 03-Jul-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2017 Predictive Analytics Symposium

Session 9, Programming in R

Moderator: Nicholas Scott Hanewinckel, FSA, CERA

Presenters:

Melissa Carruthers, FSA, FCIA Benjamin Johnson

SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer

R U UP On R?Melissa Carruthers, FSA, FCIAManager, DeloitteSeptember 2017

2© Deloitte LLP and affiliated entities.

An evolving landscape 3

Introducing R 9

Getting started 16

Agenda

An evolving landscape

4© Deloitte LLP and affiliated entities.

In any given minute…

639,800 GB of global IP data is transferred

• 1.3 million videos

• 2+ million topics searched

• 47,000 application downloads

5© Deloitte LLP and affiliated entities.

BIG DATA in life insurance

6© Deloitte LLP and affiliated entities.

The power of predictive analyticsUsing predictive analytics we can determine the likelihood of a person having certain health characteristics and developing future ailments.

Slow caffeine metabolizer

(drinking coffee increases chance of heart attack)

She is pregnant and the baby is likely to weigh 2 ounces less than average at birth

8% chance of developing rheumatoid

arthritis

1% chance of developing age-related

macular degeneration

10% chance of developing

breast cancer

Average odds of having

hay fever

Average odds of

developing ovarian cancer

Tendency toward

higher BMI

Greater tendency to

overeat

2% chance of developing

chronic kidney disease

.1% chance of developing

type 1 diabetes

Average odds of getting gout

Strong chance of developing severe nearsightedness

< 2% chance of developing Parkinson’s

disease

Has wet earwax

4% chance of

developing melanoma

Average odds of developing

uterine fibroids

Low odds for high blood

pressure

Average odds of

developing glaucoma

Average sensitivity to sweaty

odorsAverage odds of developing esophageal

cancer

Average odds of

developing pancreatic

cancer

Low risk of developing gestational diabetes during

pregnancy

5% chance of

developing MS

Likely lactose

intolerant

7© Deloitte LLP and affiliated entities.

Changing the way we work

Cross-Sell ProgramsIdentify existing

customers who are likely to need and likely to buy a second product – an

annuity, a P&C product, etc. Deploy customized,

targeted offers.

Product Design & Pricing

Sales and Marketing

New Business & Underwriting

Inforce Management

Claims and Fraud

Producer Optimization

Producer Recruitment

Identification of individuals most likely to become a successful

producer for a given manufacturer

Producer RetentionSegmenting existing

producers and deploying customized tactics to support success and

retentionProducer-Client

MatchingIdentify behavioral

patterns and personality attributes associated with

successful, lasting producer-client

relationships; deploy tactics to optimize

matches

Fraud DetectionIdentify potential over-payments of claims for LTC or related products

UnderwritingPredicting mortality

experience on a seriatim basis, using new data

sources to supplement or replace certain traditional

medical exams

Application TriageIdentifying certain

healthy individuals for which certain medical exams can be waived

Up-Sell ProgramsIdentify existing

customers whose need for life insurance has increased, and who

remain healthy. Offer increased face amount

with limited underwriting

Target Marketing / Lead Generation

Improve quality of leads by identifying those most likely to qualify & most

likely to buy

Sampling of Applications of Predictive Analytics in Core Operations for Life Insurers

Post-Level Term OffersSegment population

based on current health risk, current life insurance needs, likelihood to buy.

Deploy customized, targeted offers

LTC Claims Management (Active

Lives)For each active life,

estimate the likelihood of developing certain

cognitive or physical impairments, then

proactively encourage healthy policyholder behavior to enable

preventionLTC Claims

Management (Disabled Lives)

For each disabled life, estimate the likelihood of transitions between type of impairment (physical

vs. cognitive) and associated level of care required (home health

care, assisted care facility, nursing home),

then proactively encourage healthy policy

holder behavior

Customer Lifetime Value

Enable calculation of customized individual

CLV; deploy customized proactive tactics for

retention, second offers, etc.

Retention StrategyUse customized,

individual estimate of lapse likelihood to enable customized proactive and

reactive tactics to improve retention

effectiveness

8© Deloitte LLP and affiliated entities.

Ready for the opportunities ahead?

Introducing R

10© Deloitte LLP and affiliated entities.

Overview of R

2 Millions Users shown in Blue from the Map

Profile• R- Statistical Programing Language

• Developed in 1994

• Originated at the University of Auckland, New Zealand

• Created by Ross Ihaka & Robert Gentleman

11© Deloitte LLP and affiliated entities.

Overview of R

12© Deloitte LLP and affiliated entities.

Pros and Cons of R

FreeOpen

Source

Visualization

Memory Management

Learning curve

Package Ecosystem

StrongCommunity

13© Deloitte LLP and affiliated entities.

R has 9,153 packages built and ready for use

“If you’re trying to do something that’s not in the code set, you go out and find an R package ... and then you snap it right in and start using it,”

-Robert Sudol, Sr. Development Manager in Fixed Income Technology, AllianceBernstein.

14© Deloitte LLP and affiliated entities.

Predictive modelling in R

15© Deloitte LLP and affiliated entities.

Useful packages for ActuariesActuar Actuarial Functions and Heavy Tailed Distributions

ActuDistns Functions for actuarial scientists

CompLognormal Functions for actuarial scientists

ChainLadder Statistical Methods and Models for Claims Reserving in General Insurance

Lifecontingencies Financial and Actuarial Mathematics for Life Contingencies

Raw R Actuarial Workshops

Tweedie Tweedie exponential family models which is useful for modeling pure premiums.

insuranceData A Collection of Insurance Datasets Useful in Risk Classification in Non-life Insurance

Getting started

17© Deloitte LLP and affiliated entities.

Installing R

Obtaining R

• After clicking on CRAN you will be asked to select a Mirror

• This is the location that you will download R from

• You should select the location nearest youMirrors

• Options to download R for Linux, Mac or Windows

• Download the most recent version of R

• Computer will download the installer packageMachine

• R is available for the Unix-like, Windows and Mac families

• To download R, go to www.r-project.org and click on CRAN

• The R Core Team supports the use of R for commercial purposes

CRAN

• Click on CRAN

• Comprehensive R Archive Network (CRAN) collection of sites with material on R distribution, extensions, documentation and binaries

• https://CRAN.R-project.org/

Run & Install

• Run and install as you would any other application

• Usually reasonable to accept default options, select & continue throughout

• Select the standard installation

18© Deloitte LLP and affiliated entities.

R Distribution Manuals

Manual Description

An Introduction to R Includes information on data types, programming elements, statistical modeling and graphics

Writing R ExtensionsDescribes the process of creating R add-on packages, writing R documentation, R’s system and foreign language interfaces, and the R API

R Data Import/Export Guide to importing and exporting data to and from R

The R Language Definition

First version of the ‘Kernighan & Ritchie of R”, explains evaluation, parsing, object oriented programming and computing on the language

R Internals Guide to R’s internal structures

• Online documentation is available for most functions and variables within the R prompt as well as in pdf/html format online

• For each set of manuals sold, the publisher donates $10 USD to the R Foundation

19© Deloitte LLP and affiliated entities.

Get practicing Use existing datasets and packages in R to easily get started

Call existing datasets

Look up info “datasets”

Select dataset called AirPassangers

Summary Stats

20© Deloitte LLP and affiliated entities.

Working in R-Studio

21© Deloitte LLP and affiliated entities.

Working in R-Studio

R code editor: Where you will do majority of your work –run portions or all of your R code

Interactive console:Allows you to type R statements one line at a time, shows output of code run in editor

Workspace: list of objects in memory, history tab with list of prior commands

• Visuals/ plots

• List of available packages

• Files in working directory

• Help files

22© Deloitte LLP and affiliated entities.

Demos in RR has readily available demos to help with the learning curve

23© Deloitte LLP and affiliated entities.

Act now and gain a competitive advantage of using R

Predictive Analytics Symposium

Ben Johnson, MSSession 9: Programming in RSeptember 14, 2017

Why choose R over others?

• Free• Install R on any computer

• Open source• Share as much as you want

• Large community• Easily find support online

2

The O’Jays

R is a tool, not the chest

• Don’t join the argument of R or Excel, R or Python, R or SAS

• This isn’t Team Edward vs Team Jacob

3

Team

R(He’s a “vampie-R”)

When R isn’t the best thing ever

• Restrictive limitation in computing power• R can require a lot of local memory

• You have a need for speed• Some operations are faster using a language like C++

• Quick and easy, on-the-fly stuff• We didn’t invent the calculator to replace our fingers for

counting

4

Start using R

5

Advice for starting with R

• Install a good GUI• RStudio• Jupyter• Deducer• RKWard• Rcmndr• etc.

6

Advice for starting with R

• Document scripts well• Comments follow #• R Projects help segment code

• Manage parentheses• RStudio helps a lot

• Adopt an R coding style• Google’s R Style Guide• Hadley Wickham’s Style Guide

7

Einstein’s notebook

Advice for starting with R

• Differentiate between variable assignment methods• =, <-, ->, assign() each tell R, “this is true”!• == asks R, “is this true”?

• Use subsets of your data while writing code• R is case sensitive

8

data Data DATAHi Data!

Hello data!

Wow, DATA!! I haven’t seen you in forever.

I can’t believe how different we

all are

Advice for starting with R

• When in doubt… Google, Google, Google• Stack Overflow is your new best friend

• Don’t reinvent the horse, use libraries• Find help info for a function using help()

9

Working with data in R

10

Understand that variables have class

• Numeric• is.numeric()

• Character• as.character()

• Date• as.Date()

• NA values• is.na(), anyNA()

11

Data structure• Save data as RDS or RData to preserve information• Save data as CSV to explore it in an Excel spreadsheet

12

Homogeneous Heterogeneous

1d Atomic vector List

2d Matrix Data frame

nd Array

Useful functions for data exploration

• dim(), nrow(), ncol()• Identify dimensions of data

• head() and tail()• Preview first or last rows of data

• summary() and table()• Descriptive statistics of data

13

Data

Creating reports with R

14

rmarkdown

15

• Notebook style code

• PDF or HTML output

• Compiles top to bottom

• Reproducible research

shiny

16

ggplot2

• Especially helpful in visualizing grouped data

17

formattable

• Conditional formatting• Either pre-defined or custom functions

18

DT

19

• Search and filter data• Display larger tables

Demo in R

20

Find the full code on GitHub

https://github.com/milliman/PASymp-Programming-in-R

21

Loading the data

22

Fitting the model

23

Validation plots

24

top related