2017 predictive analytics symposium - soa › globalassets › assets › files › e... · using...
TRANSCRIPT
2017 Predictive Analytics Symposium
Session 9, Programming in R
Moderator: Nicholas Scott Hanewinckel, FSA, CERA
Presenters:
Melissa Carruthers, FSA, FCIA Benjamin Johnson
SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer
R U UP On R?Melissa Carruthers, FSA, FCIAManager, DeloitteSeptember 2017
2© Deloitte LLP and affiliated entities.
An evolving landscape 3
Introducing R 9
Getting started 16
Agenda
An evolving landscape
4© Deloitte LLP and affiliated entities.
In any given minute…
639,800 GB of global IP data is transferred
• 1.3 million videos
• 2+ million topics searched
• 47,000 application downloads
5© Deloitte LLP and affiliated entities.
BIG DATA in life insurance
6© Deloitte LLP and affiliated entities.
The power of predictive analyticsUsing predictive analytics we can determine the likelihood of a person having certain health characteristics and developing future ailments.
Slow caffeine metabolizer
(drinking coffee increases chance of heart attack)
She is pregnant and the baby is likely to weigh 2 ounces less than average at birth
8% chance of developing rheumatoid
arthritis
1% chance of developing age-related
macular degeneration
10% chance of developing
breast cancer
Average odds of having
hay fever
Average odds of
developing ovarian cancer
Tendency toward
higher BMI
Greater tendency to
overeat
2% chance of developing
chronic kidney disease
.1% chance of developing
type 1 diabetes
Average odds of getting gout
Strong chance of developing severe nearsightedness
< 2% chance of developing Parkinson’s
disease
Has wet earwax
4% chance of
developing melanoma
Average odds of developing
uterine fibroids
Low odds for high blood
pressure
Average odds of
developing glaucoma
Average sensitivity to sweaty
odorsAverage odds of developing esophageal
cancer
Average odds of
developing pancreatic
cancer
Low risk of developing gestational diabetes during
pregnancy
5% chance of
developing MS
Likely lactose
intolerant
7© Deloitte LLP and affiliated entities.
Changing the way we work
Cross-Sell ProgramsIdentify existing
customers who are likely to need and likely to buy a second product – an
annuity, a P&C product, etc. Deploy customized,
targeted offers.
Product Design & Pricing
Sales and Marketing
New Business & Underwriting
Inforce Management
Claims and Fraud
Producer Optimization
Producer Recruitment
Identification of individuals most likely to become a successful
producer for a given manufacturer
Producer RetentionSegmenting existing
producers and deploying customized tactics to support success and
retentionProducer-Client
MatchingIdentify behavioral
patterns and personality attributes associated with
successful, lasting producer-client
relationships; deploy tactics to optimize
matches
Fraud DetectionIdentify potential over-payments of claims for LTC or related products
UnderwritingPredicting mortality
experience on a seriatim basis, using new data
sources to supplement or replace certain traditional
medical exams
Application TriageIdentifying certain
healthy individuals for which certain medical exams can be waived
Up-Sell ProgramsIdentify existing
customers whose need for life insurance has increased, and who
remain healthy. Offer increased face amount
with limited underwriting
Target Marketing / Lead Generation
Improve quality of leads by identifying those most likely to qualify & most
likely to buy
Sampling of Applications of Predictive Analytics in Core Operations for Life Insurers
Post-Level Term OffersSegment population
based on current health risk, current life insurance needs, likelihood to buy.
Deploy customized, targeted offers
LTC Claims Management (Active
Lives)For each active life,
estimate the likelihood of developing certain
cognitive or physical impairments, then
proactively encourage healthy policyholder behavior to enable
preventionLTC Claims
Management (Disabled Lives)
For each disabled life, estimate the likelihood of transitions between type of impairment (physical
vs. cognitive) and associated level of care required (home health
care, assisted care facility, nursing home),
then proactively encourage healthy policy
holder behavior
Customer Lifetime Value
Enable calculation of customized individual
CLV; deploy customized proactive tactics for
retention, second offers, etc.
Retention StrategyUse customized,
individual estimate of lapse likelihood to enable customized proactive and
reactive tactics to improve retention
effectiveness
8© Deloitte LLP and affiliated entities.
Ready for the opportunities ahead?
Introducing R
10© Deloitte LLP and affiliated entities.
Overview of R
2 Millions Users shown in Blue from the Map
Profile• R- Statistical Programing Language
• Developed in 1994
• Originated at the University of Auckland, New Zealand
• Created by Ross Ihaka & Robert Gentleman
11© Deloitte LLP and affiliated entities.
Overview of R
12© Deloitte LLP and affiliated entities.
Pros and Cons of R
FreeOpen
Source
Visualization
Memory Management
Learning curve
Package Ecosystem
StrongCommunity
13© Deloitte LLP and affiliated entities.
R has 9,153 packages built and ready for use
“If you’re trying to do something that’s not in the code set, you go out and find an R package ... and then you snap it right in and start using it,”
-Robert Sudol, Sr. Development Manager in Fixed Income Technology, AllianceBernstein.
14© Deloitte LLP and affiliated entities.
Predictive modelling in R
15© Deloitte LLP and affiliated entities.
Useful packages for ActuariesActuar Actuarial Functions and Heavy Tailed Distributions
ActuDistns Functions for actuarial scientists
CompLognormal Functions for actuarial scientists
ChainLadder Statistical Methods and Models for Claims Reserving in General Insurance
Lifecontingencies Financial and Actuarial Mathematics for Life Contingencies
Raw R Actuarial Workshops
Tweedie Tweedie exponential family models which is useful for modeling pure premiums.
insuranceData A Collection of Insurance Datasets Useful in Risk Classification in Non-life Insurance
Getting started
17© Deloitte LLP and affiliated entities.
Installing R
Obtaining R
• After clicking on CRAN you will be asked to select a Mirror
• This is the location that you will download R from
• You should select the location nearest youMirrors
• Options to download R for Linux, Mac or Windows
• Download the most recent version of R
• Computer will download the installer packageMachine
• R is available for the Unix-like, Windows and Mac families
• To download R, go to www.r-project.org and click on CRAN
• The R Core Team supports the use of R for commercial purposes
CRAN
• Click on CRAN
• Comprehensive R Archive Network (CRAN) collection of sites with material on R distribution, extensions, documentation and binaries
• https://CRAN.R-project.org/
Run & Install
• Run and install as you would any other application
• Usually reasonable to accept default options, select & continue throughout
• Select the standard installation
18© Deloitte LLP and affiliated entities.
R Distribution Manuals
Manual Description
An Introduction to R Includes information on data types, programming elements, statistical modeling and graphics
Writing R ExtensionsDescribes the process of creating R add-on packages, writing R documentation, R’s system and foreign language interfaces, and the R API
R Data Import/Export Guide to importing and exporting data to and from R
The R Language Definition
First version of the ‘Kernighan & Ritchie of R”, explains evaluation, parsing, object oriented programming and computing on the language
R Internals Guide to R’s internal structures
• Online documentation is available for most functions and variables within the R prompt as well as in pdf/html format online
• For each set of manuals sold, the publisher donates $10 USD to the R Foundation
19© Deloitte LLP and affiliated entities.
Get practicing Use existing datasets and packages in R to easily get started
Call existing datasets
Look up info “datasets”
Select dataset called AirPassangers
Summary Stats
20© Deloitte LLP and affiliated entities.
Working in R-Studio
21© Deloitte LLP and affiliated entities.
Working in R-Studio
R code editor: Where you will do majority of your work –run portions or all of your R code
Interactive console:Allows you to type R statements one line at a time, shows output of code run in editor
Workspace: list of objects in memory, history tab with list of prior commands
• Visuals/ plots
• List of available packages
• Files in working directory
• Help files
22© Deloitte LLP and affiliated entities.
Demos in RR has readily available demos to help with the learning curve
23© Deloitte LLP and affiliated entities.
Act now and gain a competitive advantage of using R
Predictive Analytics Symposium
Ben Johnson, MSSession 9: Programming in RSeptember 14, 2017
Why choose R over others?
• Free• Install R on any computer
• Open source• Share as much as you want
• Large community• Easily find support online
2
The O’Jays
R is a tool, not the chest
• Don’t join the argument of R or Excel, R or Python, R or SAS
• This isn’t Team Edward vs Team Jacob
3
Team
R(He’s a “vampie-R”)
When R isn’t the best thing ever
• Restrictive limitation in computing power• R can require a lot of local memory
• You have a need for speed• Some operations are faster using a language like C++
• Quick and easy, on-the-fly stuff• We didn’t invent the calculator to replace our fingers for
counting
4
Start using R
5
Advice for starting with R
• Install a good GUI• RStudio• Jupyter• Deducer• RKWard• Rcmndr• etc.
6
Advice for starting with R
• Document scripts well• Comments follow #• R Projects help segment code
• Manage parentheses• RStudio helps a lot
• Adopt an R coding style• Google’s R Style Guide• Hadley Wickham’s Style Guide
7
Einstein’s notebook
Advice for starting with R
• Differentiate between variable assignment methods• =, <-, ->, assign() each tell R, “this is true”!• == asks R, “is this true”?
• Use subsets of your data while writing code• R is case sensitive
8
data Data DATAHi Data!
Hello data!
Wow, DATA!! I haven’t seen you in forever.
I can’t believe how different we
all are
Advice for starting with R
• When in doubt… Google, Google, Google• Stack Overflow is your new best friend
• Don’t reinvent the horse, use libraries• Find help info for a function using help()
9
Working with data in R
10
Understand that variables have class
• Numeric• is.numeric()
• Character• as.character()
• Date• as.Date()
• NA values• is.na(), anyNA()
11
Data structure• Save data as RDS or RData to preserve information• Save data as CSV to explore it in an Excel spreadsheet
12
Homogeneous Heterogeneous
1d Atomic vector List
2d Matrix Data frame
nd Array
Useful functions for data exploration
• dim(), nrow(), ncol()• Identify dimensions of data
• head() and tail()• Preview first or last rows of data
• summary() and table()• Descriptive statistics of data
13
Data
Creating reports with R
14
rmarkdown
15
• Notebook style code
• PDF or HTML output
• Compiles top to bottom
• Reproducible research
shiny
16
ggplot2
• Especially helpful in visualizing grouped data
17
formattable
• Conditional formatting• Either pre-defined or custom functions
18
DT
19
• Search and filter data• Display larger tables
Demo in R
20
Find the full code on GitHub
https://github.com/milliman/PASymp-Programming-in-R
21
Loading the data
22
Fitting the model
23
Validation plots
24